Sentence
SentenceChunker #
Bases: BaseTextChunker
Designed to split input text into smaller chunks, particularly useful for processing large documents or texts. Tries to keep sentences and paragraphs together.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_size |
int
|
Size of each chunk. Default is |
chunk_overlap |
int
|
Amount of overlap between chunks. Default is |
separator |
str
|
Separator used for splitting text. Default is |
Example
from beekeeper.core.text_chunker import SentenceChunker
text_chunker = SentenceChunker()
chunk_text #
chunk_text(text: str) -> list[str]
Split a single string of text into smaller chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to split. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of text chunks. |
Example
chunks = text_chunker.chunk_text(
"Beekeeper is a data framework to load any data in one line of code and connect with AI applications."
)
chunk_documents #
chunk_documents(documents: list[Document]) -> list[Document]
Split a list of documents into smaller document chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
documents
|
list[Document]
|
List of |
required |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: List of chunked documents objects. |