Ingestion workflow
An ingestion workflow for processing and storing data.
Attributes:
| Name | Type | Description |
|---|---|---|
transformers |
list[TransformerComponent]
|
A list of transformer components applied to the input documents. |
doc_strategy |
DocStrategy
|
The strategy used for handling document duplicates.
Defaults to |
post_transformer |
bool
|
Whether document de-duplication should be applied after transformation step.
Defaults to |
readers |
BaseLoader
|
List of loaders for loading or fetching documents. |
vector_store |
BaseVectorStore
|
Vector store for saving processed documents |
Example
from beekeeper.core.workflows import IngestionWorkflow
from beekeeper.core.text_chunkers import TokenTextChunker
from beekeeper.embeddings.huggingface import HuggingFaceEmbedding
ingestion_workflow = IngestionWorkflow(
transformers=[
TokenTextChunker(),
HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-small"),
]
)
run #
run(documents: list[Document] = []) -> list[Document]
Run an ingestion workflow.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
documents
|
list[Document]
|
Set of documents to be transformed. |
[]
|
Example
ingestion_workflow.run(documents: list[Document])