File

DocxLoader #

Bases: BaseLoader

Microsoft Word (Docx) loader.

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.

Returns:

Type	Description
`list[Document]`	list[Document]: A list of `Document` objects loaded from the file.

Bases: BaseLoader

Load a HTML file and extract text from a specific tag.

Attributes:

Name	Type	Description
`tag`	`str`	HTML tag to extract. Defaults to `section`.

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Parameters:

Name	Type	Description	Default
`input_file`	`str`	File path to load.	required

Returns:

Type	Description
`list[Document]`	list[Document]: A list of `Document` objects loaded from the file.

Bases: BaseLoader

JSON loader.

Attributes:

Name	Type	Description
`jq_schema`	`str`	jq schema to use to extract the data from the JSON.

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Parameters:

Name	Type	Description	Default
`input_file`	`str`	File path to load.	required

Returns:

Type	Description
`list[Document]`	list[Document]: A list of `Document` objects loaded from the file.

Bases: BaseLoader

PDF loader using PyPDF.

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the specified file.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.

Returns:

Type	Description
`list[Document]`	list[Document]: A list of `Document` objects loaded from the file.