File Loader Configuration
DeepSearcher supports various file loaders to extract and process content from different file formats.
📝 Basic Configuration
config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
📋 Available File Loaders
Loader | Description | Supported Formats |
---|---|---|
UnstructuredLoader | General purpose document loader with broad format support | PDF, DOCX, PPT, HTML, etc. |
DoclingLoader | Document processing library with extraction capabilities | See documentation |
🔍 File Loader Options
Unstructured
Unstructured is a powerful library for extracting content from various document formats.
config.set_provider_config("file_loader", "UnstructuredLoader", {})
Setup Instructions
You can use Unstructured in two ways:
- With API (recommended for production)
-
Set environment variables:
UNSTRUCTURED_API_KEY
UNSTRUCTURED_API_URL
-
Local Processing
- Simply don't set the API environment variables
- Install required dependencies:
# Install core dependencies pip install unstructured-ingest # For all document formats pip install "unstructured[all-docs]" # For specific formats (e.g., PDF only) pip install "unstructured[pdf]"
For more information: - Unstructured Documentation - Installation Guide
Docling
Docling provides document processing capabilities with support for multiple formats.
config.set_provider_config("file_loader", "DoclingLoader", {})
Setup Instructions
-
Install Docling:
pip install docling
-
For information on supported formats, see the Docling documentation.