Ingestion Flow¶
- class IngestionFlow¶
An ingestion flow for processing and storing data.
- Parameters:
transformers (List[TransformerComponent]) – A list of transformer components applied to the input documents.
doc_strategy (DocStrategy) – The strategy used for handling document duplicates. Defaults to DocStrategy.DUPLICATE_ONLY.
post_transformer (bool) – Whether document de-duplication should be applied after transformation step. Defaults to False.
readers (BaseReader, optional) – List of readers for loading or fetching documents.
vector_store (BaseVectorStore, optional) – Vector store for saving processed documents
Example
from beekeeper.core.flows import IngestionFlow from beekeeper.core.text_chunkers import TokenTextChunker from beekeeper.embeddings.huggingface import HuggingFaceEmbedding ingestion_flow = IngestionFlow( transformers=[ TokenTextChunker(), HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small"), ] )
Enums¶
DocStrategy¶
Name |
Description |
---|---|
DocStrategy.DUPLICATE_ONLY |
Inserts only new, unique documents. Skips duplicates from the input batch and existing data in the vector store. |
DocStrategy.DUPLICATE_AND_DELETE |
Deletes all existing documents in the vector store and replaces them with the new batch after removing duplicates. |
DocStrategy.DEDUPLICATE_OFF |
Inserts all input documents as-is, regardless of duplicates or existing content. |