File
DocxLoader #
Bases: BaseLoader
Microsoft Word (Docx) loader.
load_data #
load_data(input_file: str, **kwargs: Any) -> list[Document]
Loads data from the specified file.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: A list of |
HTMLLoader #
Bases: BaseLoader
Load a HTML file and extract text from a specific tag.
Attributes:
| Name | Type | Description |
|---|---|---|
tag |
str
|
HTML tag to extract. Defaults to |
load_data #
load_data(input_file: str, **kwargs: Any) -> list[Document]
Loads data from the specified file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_file
|
str
|
File path to load. |
required |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: A list of |
JSONLoader #
Bases: BaseLoader
JSON loader.
Attributes:
| Name | Type | Description |
|---|---|---|
jq_schema |
str
|
jq schema to use to extract the data from the JSON. |
load_data #
load_data(input_file: str, **kwargs: Any) -> list[Document]
Loads data from the specified file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_file
|
str
|
File path to load. |
required |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: A list of |
PDFLoader #
Bases: BaseLoader
PDF loader using PyPDF.
load_data #
load_data(input_file: str, **kwargs: Any) -> list[Document]
Loads data from the specified file.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
Returns:
| Type | Description |
|---|---|
list[Document]
|
list[Document]: A list of |