Skip to content

Web Crawler Configuration

DeepSearcher supports various web crawlers to collect data from websites for processing and indexing.

📝 Basic Configuration

config.set_provider_config("web_crawler", "(WebCrawlerName)", "(Arguments dict)")

📋 Available Web Crawlers

Crawler Description Key Feature
FireCrawlCrawler Cloud-based web crawling service Simple API, managed service
Crawl4AICrawler Browser automation crawler Full JavaScript support
JinaCrawler Content extraction service High accuracy parsing
DoclingCrawler Doc processing with crawling Multiple format support

🔍 Web Crawler Options

FireCrawl

FireCrawl is a cloud-based web crawling service designed for AI applications.

Key features: - Simple API - Managed Service - Advanced Parsing

config.set_provider_config("web_crawler", "FireCrawlCrawler", {})
Setup Instructions
  1. Sign up for FireCrawl and get an API key
  2. Set the API key as an environment variable:
    export FIRECRAWL_API_KEY="your_api_key"
    
  3. For more information, see the FireCrawl documentation

Crawl4AI

Crawl4AI is a Python package for web crawling with browser automation capabilities.

config.set_provider_config("web_crawler", "Crawl4AICrawler", {"browser_config": {"headless": True, "verbose": True}})
Setup Instructions
  1. Install Crawl4AI:
    pip install crawl4ai
    
  2. Run the setup command:
    crawl4ai-setup
    
  3. For more information, see the Crawl4AI documentation

Jina Reader

Jina Reader is a service for extracting content from web pages with high accuracy.

config.set_provider_config("web_crawler", "JinaCrawler", {})
Setup Instructions
  1. Get a Jina API key
  2. Set the API key as an environment variable:
    export JINA_API_TOKEN="your_api_key"
    # or
    export JINAAI_API_KEY="your_api_key"
    
  3. For more information, see the Jina Reader documentation

Docling Crawler

Docling provides web crawling capabilities alongside its document processing features.

config.set_provider_config("web_crawler", "DoclingCrawler", {})
Setup Instructions
  1. Install Docling:
    pip install docling
    
  2. For information on supported formats, see the Docling documentation