Skip to content

Directory

DirectoryLoader #

Bases: BaseLoader

Loads files from a directory, optionally filtering by file extension and allowing recursive directory traversal.

Attributes:

Name Type Description
required_exts list[str]

List of file extensions to filter by. Only files with these extensions will be loaded. Must start with a dot. Defaults to [".pdf", ".docx", ".html"].

recursive bool

Whether to recursively search subdirectories for files. Defaults to False.

file_loader dict[str, Type[BaseLoader]] | None

Custom mapping of file extensions to loader classes. If None, default loaders will be used.

Example
from beekeeper.core.loaders import DirectoryLoader

# Using default loaders
directory_loader = DirectoryLoader()
documents = directory_loader.load_data("/path/to/directory")

# Using custom extensions
directory_loader = DirectoryLoader(
    required_exts=[".pdf", ".txt"], recursive=True
)
documents = directory_loader.load_data("/path/to/directory")

load_data #

load_data(input_dir: str, **kwargs: Any) -> list[Document]

Loads data from the specified directory.

Parameters:

Name Type Description Default
input_dir str

Directory path from which to load the documents.

required

Returns:

Type Description
list[Document]

list[Document]: A list of documents loaded from the directory.