Skip to content

File Loader Configuration

DeepSearcher supports various file loaders to extract and process content from different file formats.

📝 Basic Configuration

config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")

📋 Available File Loaders

Loader Description Supported Formats
UnstructuredLoader General purpose document loader with broad format support PDF, DOCX, PPT, HTML, etc.
DoclingLoader Document processing library with extraction capabilities See documentation

🔍 File Loader Options

Unstructured

Unstructured is a powerful library for extracting content from various document formats.

config.set_provider_config("file_loader", "UnstructuredLoader", {})
Setup Instructions

You can use Unstructured in two ways:

  1. With API (recommended for production)
  2. Set environment variables:

    • UNSTRUCTURED_API_KEY
    • UNSTRUCTURED_API_URL
  3. Local Processing

  4. Simply don't set the API environment variables
  5. Install required dependencies:
    # Install core dependencies
    pip install unstructured-ingest
    
    # For all document formats
    pip install "unstructured[all-docs]"
    
    # For specific formats (e.g., PDF only)
    pip install "unstructured[pdf]"
    

For more information: - Unstructured Documentation - Installation Guide

Docling

Docling provides document processing capabilities with support for multiple formats.

config.set_provider_config("file_loader", "DoclingLoader", {})
Setup Instructions
  1. Install Docling:

    pip install docling
    

  2. For information on supported formats, see the Docling documentation.