Skip to content

Docling

DoclingLoader #

Bases: BaseLoader

A document loader that uses the docling library to extract and structure content from various file types including PDF, DOCX, and HTML.

For more information, see Docling

Attributes:

Name Type Description
detached_tables bool

If True, separates extracted tables from the main document text and treats them as individual documents. Default is False.

export_table_format str

Format used when exporting tables. Applicable only if detached_tables is True. Choose between "markdown" or "html". Defaults to "markdown".

load_data #

load_data(input_file: str, **kwargs: Any) -> list[Document]

Loads data from the given input file.

Parameters:

Name Type Description Default
input_file str

File path to load.

required

Returns:

Type Description
list[Document]

list[Document]: A list of Document objects loaded from the file.