Skip to content

File

DocxLoader #

Bases: BaseLoader

Microsoft Word (Docx) loader.

load_data #

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Attributes:

Name Type Description
input_file str

File path to load.

Returns:

Type Description
list[Document]

list[Document]: A list of Document objects loaded from the file.

HTMLLoader #

Bases: BaseLoader

Load a HTML file and extract text from a specific tag.

Attributes:

Name Type Description
tag str

HTML tag to extract. Defaults to section.

load_data #

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Parameters:

Name Type Description Default
input_file str

File path to load.

required

Returns:

Type Description
list[Document]

list[Document]: A list of Document objects loaded from the file.

JSONLoader #

Bases: BaseLoader

JSON loader.

Attributes:

Name Type Description
jq_schema str

jq schema to use to extract the data from the JSON.

load_data #

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Parameters:

Name Type Description Default
input_file str

File path to load.

required

Returns:

Type Description
list[Document]

list[Document]: A list of Document objects loaded from the file.

PDFLoader #

Bases: BaseLoader

PDF loader using PyPDF.

load_data #

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Attributes:

Name Type Description
input_file str

File path to load.

Returns:

Type Description
list[Document]

list[Document]: A list of Document objects loaded from the file.