Skip to content

Token

TokenTextChunker #

Bases: BaseTextChunker

This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.

Attributes:

Name Type Description
chunk_size int

Size of each chunk. Default is 512.

chunk_overlap int

Amount of overlap between chunks. Default is 256.

separator str

Separators used for splitting into words. Default is \\n\\n.

Example
from beekeeper.core.text_chunker import TokenTextChunker

text_chunker = TokenTextChunker()

chunk_text #

chunk_text(text: str) -> list[str]

Split a single string of text into smaller chunks.

Parameters:

Name Type Description Default
text str

Input text to split.

required

Returns:

Type Description
list[str]

list[str]: List of text chunks.

Example
chunks = text_chunker.chunk_text(
    "Beekeeper is a data framework to load any data in one line of code and connect with AI applications."
)

chunk_documents #

chunk_documents(documents: list[Document]) -> list[Document]

Split a list of documents into smaller document chunks.

Parameters:

Name Type Description Default
documents list[Document]

List of Document objects to split.

required

Returns:

Type Description
list[Document]

list[Document]: List of chunked documents objects.