Directory
DirectoryLoader #
Bases: BaseLoader
Loads files from a directory, optionally filtering by file extension and allowing recursive directory traversal.
Attributes:
| Name | Type | Description |
|---|---|---|
required_exts |
list[str]
|
List of file extensions to filter by. Only files with these extensions will be loaded. Must start with a dot. Defaults to [".pdf", ".docx", ".html"]. |
recursive |
bool
|
Whether to recursively search subdirectories for files. Defaults to False. |
file_loader |
dict[str, Type[BaseLoader]] | None
|
Custom mapping of file extensions to loader classes. If None, default loaders will be used. |
Example
from beekeeper.core.loaders import DirectoryLoader
# Using default loaders
directory_loader = DirectoryLoader()
documents = directory_loader.load_data("/path/to/directory")
# Using custom extensions
directory_loader = DirectoryLoader(
required_exts=[".pdf", ".txt"], recursive=True
)
documents = directory_loader.load_data("/path/to/directory")
load_data #
load_data(input_dir: str, **kwargs: Any) -> list[Document]
Loads data from the specified directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dir
|
str
|
Directory path from which to load the documents. |
required |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: A list of documents loaded from the directory. |