Ingestion Flow

class IngestionFlow

An ingestion flow for processing and storing data.

Parameters:
  • transformers (List[TransformerComponent]) – A list of transformer components applied to the input documents.

  • doc_strategy (DocStrategy) – The strategy used for handling document duplicates. Defaults to DocStrategy.DUPLICATE_ONLY.

  • post_transformer (bool) – Whether document de-duplication should be applied after transformation step. Defaults to False.

  • readers (BaseReader, optional) – List of readers for loading or fetching documents.

  • vector_store (BaseVectorStore, optional) – Vector store for saving processed documents

Example

from beekeeper.core.flows import IngestionFlow
from beekeeper.core.text_chunkers import TokenTextChunker
from beekeeper.embeddings.huggingface import HuggingFaceEmbedding

ingestion_flow = IngestionFlow(
    transformers=[
        TokenTextChunker(),
        HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small"),
    ]
)
run(documents=None)

Run an ingestion flow.

Parameters:

documents (List[Document] | None) – Set of documents to be transformed.

Example

ingestion_flow.run(documents: List[Document])

Enums

DocStrategy

Name

Description

DocStrategy.DUPLICATE_ONLY

Inserts only new, unique documents. Skips duplicates from the input batch and existing data in the vector store.

DocStrategy.DUPLICATE_AND_DELETE

Deletes all existing documents in the vector store and replaces them with the new batch after removing duplicates.

DocStrategy.DEDUPLICATE_OFF

Inserts all input documents as-is, regardless of duplicates or existing content.