MongoDB (mongo)¶
The mongo connector indexes documents from a single MongoDB database. Each
document becomes a searchable record, and each collection gets a sampled schema
preview (Mongo has no fixed schema, so it's inferred from a sample).
How MFS sees it¶
Collections sit under the alias; each exposes its documents and a sampled schema:
mongo://prod-cluster/
└── support_threads/
├── documents.jsonl record_collection → one searchable chunk per document
└── schema.json table_schema → sampled field summary
Documents are chunked per-document and need text_fields to become searchable.
Credentials¶
A MongoDB connection URI, in either form:
mongodb://user:pass@host:27017/?authSource=admin
mongodb+srv://user:pass@cluster.mongodb.net/?retryWrites=true
For Atlas, copy the SRV URI from Database → Connect → Drivers and substitute
the real password. A read-only user is enough — the connector only runs find().
Probe before MFS sees it:
Configuration¶
uri = "env:MONGO_URI"
database = "prod"
cursor_field = "updatedAt" # or _id; enables incremental re-sync
max_read_docs = 100000
[[objects]]
match = "/support_threads"
text_fields = ["title", "messages[].body"]
locator_fields = ["_id"]
metadata_fields = ["status"]
text_fields supports nested paths like messages[].body to pull text out of
arrays of subdocuments.
Sync and freshness¶
With cursor_field set (updatedAt or _id), re-syncs pull only documents
changed since the last run; deletions are caught by full_scan.
Search and browse¶
mfs connector probe mongo://prod-cluster --config ./mongo.toml
mfs add mongo://prod-cluster --config ./mongo.toml
mfs search "refund escalation" mongo://prod-cluster/support_threads/documents.jsonl
mfs cat mongo://prod-cluster/support_threads/documents.jsonl --locator '{"_id":"65a3..."}'
Pitfalls¶
- Documents are heterogeneous; fields absent from a given document are simply skipped when rendering its text.
_idlocators use the serialized string form, notObjectId(...).max_read_docscaps large collections and can mark recall partial.