Sentence

SentenceChunker #

Bases: BaseTextChunker

Designed to split input text into smaller chunks, particularly useful for processing large documents or texts. Tries to keep sentences and paragraphs together.

Attributes:

Name	Type	Description
`chunk_size`	`int`	Size of each chunk. Default is `512`.
`chunk_overlap`	`int`	Amount of overlap between chunks. Default is `256`.
`separator`	`str`	Separator used for splitting text. Default is `" "`.

Example

from beekeeper.core.text_chunker import SentenceChunker

text_chunker = SentenceChunker()

chunk_text #

chunk_text(text: str) -> list[str]

Split a single string of text into smaller chunks.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to split.	required

Returns:

Type	Description
`list[str]`	list[str]: List of text chunks.

Example

chunks = text_chunker.chunk_text(
    "Beekeeper is a data framework to load any data in one line of code and connect with AI applications."
)

chunk_documents #

chunk_documents(documents: list[Document]) -> list[Document]

Split a list of documents into smaller document chunks.

Parameters:

Name	Type	Description	Default
`documents`	`list[Document]`	List of `Document` objects to split.	required

Returns:

Type	Description
`list[Document]`	list[Document]: List of chunked documents objects.