BigQuery (bigquery)¶
The bigquery connector indexes BigQuery table rows as searchable records, with a
schema summary per table. Use it to search large analytical tables — a knowledge
base, an events table — by meaning.
How MFS sees it¶
bigquery://analytics/
└── events/
└── tables/
└── user_events/
├── rows.jsonl table_rows → one searchable chunk per row
└── schema.json table_schema → searchable column summary
Rows are chunked per-row and need [[objects]].text_fields to become searchable.
Credentials¶
BigQuery uses Application Default Credentials (ADC) — there is no token in the TOML; the credentials must be visible to the server process. Three common paths:
-
Service account JSON (production): create a service account with
roles/bigquery.dataVieweron the target datasets, then point the server at it: -
gcloud auth application-default login(dev / single-user) — writes~/.config/gcloud/application_default_credentials.json. -
Workload Identity on GKE / Cloud Run — ADC is automatic.
Make sure the BigQuery API is enabled on the project.
Configuration¶
project = "analytics-prod"
datasets = ["events", "kb"]
max_read_rows = 100000
[[objects]]
match = "/kb/tables/articles"
text_fields = ["title", "body_markdown"]
locator_fields = ["article_id"]
Sync and freshness¶
The connector uses the table's modified time as its cursor; deletions are caught
by full_scan. It reads rows via list_rows, so max_read_rows caps large
tables.
Search and browse¶
mfs add bigquery://analytics --config ./bigquery.toml
mfs search "refund event" bigquery://analytics/events/tables/user_events/rows.jsonl
mfs search "email column" bigquery://analytics --kind schema_summary
mfs cat bigquery://analytics/kb/tables/articles/rows.jsonl --locator '{"article_id":"a-123"}'
Pitfalls¶
- ADC must be visible to the server process, not just the CLI shell.
- BigQuery has no primary key for most tables — choose stable
locator_fieldsexplicitly. - Rows need
text_fieldsto be searchable.