FireCrawl Integration Example
This example demonstrates how to use FireCrawl with DeepSearcher to crawl and extract content from websites.
Overview
FireCrawl is a specialized web crawling service designed for AI applications. This example shows:
- Setting up FireCrawl with DeepSearcher
- Configuring API keys for the service
- Crawling a website and extracting content
- Querying the extracted content
Code Example
import logging
import os
from deepsearcher.offline_loading import load_from_website
from deepsearcher.online_query import query
from deepsearcher.configuration import Configuration, init_config
# Suppress unnecessary logging from third-party libraries
logging.getLogger("httpx").setLevel(logging.WARNING)
# Set API keys (ensure these are set securely in real applications)
os.environ['OPENAI_API_KEY'] = 'sk-***************'
os.environ['FIRECRAWL_API_KEY'] = 'fc-***************'
def main():
# Step 1: Initialize configuration
config = Configuration()
# Set up Vector Database (Milvus) and Web Crawler (FireCrawlCrawler)
config.set_provider_config("vector_db", "Milvus", {})
config.set_provider_config("web_crawler", "FireCrawlCrawler", {})
# Apply the configuration
init_config(config)
# Step 2: Load data from a website into Milvus
website_url = "https://example.com" # Replace with your target website
collection_name = "FireCrawl"
collection_description = "All Milvus Documents"
# crawl a single webpage
load_from_website(urls=website_url, collection_name=collection_name, collection_description=collection_description)
# only applicable if using Firecrawl: deepsearcher can crawl multiple webpages, by setting max_depth, limit, allow_backward_links
# load_from_website(urls=website_url, max_depth=2, limit=20, allow_backward_links=True, collection_name=collection_name, collection_description=collection_description)
# Step 3: Query the loaded data
question = "What is Milvus?" # Replace with your actual question
result = query(question)
if __name__ == "__main__":
main()
Running the Example
- Install DeepSearcher:
pip install deepsearcher
- Sign up for a FireCrawl API key at firecrawl.dev
- Replace the placeholder API keys with your actual keys
- Change the
website_url
to the website you want to crawl - Run the script:
python load_website_using_firecrawl.py
Advanced Crawling Options
FireCrawl provides several advanced options for crawling:
max_depth
: Control how many links deep the crawler should golimit
: Set a maximum number of pages to crawlallow_backward_links
: Allow the crawler to navigate to parent/sibling pages
Key Concepts
- Web Crawling: Extracting content from websites
- Depth Control: Managing how deep the crawler navigates
- URL Processing: Handling multiple pages from a single starting point
- Vector Storage: Storing the crawled content in a vector database for search