Oracle Example

This example demonstrates an advanced setup using path manipulation and detailed token tracking.

Overview

This example shows:

Setting up Python path for importing from the parent directory
Initializing DeepSearcher with default configuration
Loading a PDF document and creating a vector database
Performing a complex query with full result and token tracking
Optional token consumption monitoring

Code Example

import sys, os
from pathlib import Path
script_directory = Path(__file__).resolve().parent.parent
sys.path.append(os.path.abspath(script_directory))

import logging

httpx_logger = logging.getLogger("httpx")  # disable openai's logger output
httpx_logger.setLevel(logging.WARNING)

current_dir = os.path.dirname(os.path.abspath(__file__))

# Customize your config here
from deepsearcher.configuration import Configuration, init_config

config = Configuration()
init_config(config=config)

# Load your local data
# Hint: You can load from a directory or a single file, please execute it in the root directory of the deep searcher project

from deepsearcher.offline_loading import load_from_local_files

load_from_local_files(
    paths_or_directory=os.path.join(current_dir, "data/WhatisMilvus.pdf"),
    collection_name="milvus_docs",
    collection_description="All Milvus Documents",
    # force_new_collection=True, # If you want to drop origin collection and create a new collection every time, set force_new_collection to True
)

# Query
from deepsearcher.online_query import query

question = 'Write a report comparing Milvus with other vector databases.'
answer, retrieved_results, consumed_token = query(question)
print(answer)

# get consumed tokens, about: 2.5~3w tokens when using openai gpt-4o model
# print(f"Consumed tokens: {consumed_token}")

Running the Example

Install DeepSearcher: pip install deepsearcher
Make sure you have the data directory with "WhatisMilvus.pdf" (or change the path)
Run the script: python basic_example_oracle.py

Key Concepts

Path Management: Setting up Python path to import from parent directory
Query Unpacking: Getting full result details (answer, retrieved context, and tokens)
Complex Querying: Asking for a comparative analysis that requires synthesis
Token Economy: Monitoring token usage for cost optimization