Token

TokenTextChunker #

Bases: BaseTextChunker

This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.

Attributes:

Name	Type	Description
`chunk_size`	`int`	Size of each chunk. Default is `512`.
`chunk_overlap`	`int`	Amount of overlap between chunks. Default is `256`.
`separator`	`str`	Separators used for splitting into words. Default is `\\n\\n`.

Example

from beekeeper.core.text_chunker import TokenTextChunker

text_chunker = TokenTextChunker()

chunk_text(text: str) -> list[str]

Split a single string of text into smaller chunks.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to split.	required

Returns:

Type	Description
`list[str]`	list[str]: List of text chunks.

Example

chunks = text_chunker.chunk_text(
    "Beekeeper is a data framework to load any data in one line of code and connect with AI applications."
)

chunk_documents(documents: list[Document]) -> list[Document]

Split a list of documents into smaller document chunks.

Parameters:

Name	Type	Description	Default
`documents`	`list[Document]`	List of `Document` objects to split.	required

Returns:

Type	Description
`list[Document]`	list[Document]: List of chunked documents objects.