Skip to content

Ingestion workflow

An ingestion workflow for processing and storing data.

Attributes:

Name Type Description
transformers list[TransformerComponent]

A list of transformer components applied to the input documents.

doc_strategy DocStrategy

The strategy used for handling document duplicates. Defaults to DocStrategy.DUPLICATE_ONLY.

post_transformer bool

Whether document de-duplication should be applied after transformation step. Defaults to False.

readers BaseLoader

List of loaders for loading or fetching documents.

vector_store BaseVectorStore

Vector store for saving processed documents

Example
from beekeeper.core.workflows import IngestionWorkflow
from beekeeper.core.text_chunkers import TokenTextChunker
from beekeeper.embeddings.huggingface import HuggingFaceEmbedding

ingestion_workflow = IngestionWorkflow(
    transformers=[
        TokenTextChunker(),
        HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small"),
    ]
)

run #

run(documents: list[Document] = []) -> list[Document]

Run an ingestion workflow.

Parameters:

Name Type Description Default
documents list[Document]

Set of documents to be transformed.

[]
Example
ingestion_workflow.run(documents: list[Document])