Ingestion workflow

An ingestion workflow for processing and storing data.

Attributes:

Name	Type	Description
`transformers`	`list[TransformerComponent]`	A list of transformer components applied to the input documents.
`doc_strategy`	`DocStrategy`	The strategy used for handling document duplicates. Defaults to `DocStrategy.DUPLICATE_ONLY`.
`post_transformer`	`bool`	Whether document de-duplication should be applied after transformation step. Defaults to `False`.
`readers`	`BaseLoader`	List of loaders for loading or fetching documents.
`vector_store`	`BaseVectorStore`	Vector store for saving processed documents

Example

from beekeeper.core.workflows import IngestionWorkflow
from beekeeper.core.text_chunkers import TokenTextChunker
from beekeeper.embeddings.huggingface import HuggingFaceEmbedding

ingestion_workflow = IngestionWorkflow(
    transformers=[
        TokenTextChunker(),
        HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small"),
    ]
)

run #

run(documents: list[Document] = []) -> list[Document]

Run an ingestion workflow.

Parameters:

Name	Type	Description	Default
`documents`	`list[Document]`	Set of documents to be transformed.	`[]`

Example

ingestion_workflow.run(documents: list[Document])