One model, many sources¶

A code repo, a Postgres table, a Slack workspace, a folder of PDFs — they have almost nothing in common:

Source	What one "thing" is	How you reach it	How it tells you something changed
a code repo	a file	read the disk	the file's content hash
a Postgres table	a row	run a SQL query	an `updated_at` column
a Slack channel	a message thread	call the Slack API	the time of the newest message
a folder of PDFs	a file, turned into text	read it, then convert it	its size and modified time, then a hash

If MFS handled each one its own way, every source would need its own indexer and its own search. Instead, MFS makes every source look the same to everything downstream. Once a source is brought in, search, browse, caching, and recovery don't know or care what it really was. This page shows the few things MFS makes identical, the few it lets each source keep its own way, and why that split is what makes adding a new source easy.

They all look like a tree of files¶

Whatever the source, you walk it with the same ls, cat, and tree you'd use on a folder. A database and a chat workspace browse exactly alike:

$ mfs ls postgres://prod/public
tickets/   users/

$ mfs ls slack://acme/channels
eng-backend__C012345/   general__C067890/

Nothing here is actually a file on disk — a table's rows and a chat's threads aren't. "Looks like a file" just means you can list it and read it, and that's all the rest of MFS needs. Each object even has a telling extension, so one cat knows how to show it:

You see	It is	`cat` shows
`…/tickets/rows.jsonl`	a table's rows	the records
`…/tickets/schema.json`	the table's columns	the schema
`…/pages/install.md`	a converted web or Notion page	Markdown

Every object is one of a few kinds¶

MFS doesn't try to special-case a thousand formats. It sorts every object into one of a small set of kinds, and the kind — not where it came from — decides how it's made searchable:

Example object	Kind	What it turns into
`engine.py`	code	chunks addressed by line range
a design PDF	document	text, converted and split into chunks
one ticket row	table row	one searchable record
a Slack thread	message thread	one chunk per thread
a screenshot	image	a short description (when image support is on)

There are only a few kinds, so the expensive work behind each one — splitting, embedding, indexing — is built once and reused by every source. Bringing in a new source is mostly a matter of saying which kind each thing is.

Which part of each object gets searched¶

For a file or a message this is obvious — the file's text, the message's text, all of it. The interesting case is structured data: a database row has many columns, and you decide which ones are worth searching. You point MFS at three sets of columns per table:

Role	Which columns	For a `tickets` table
The searchable text (what gets embedded)	`text_fields`	`title`, `description`
How to reopen the exact row	`locator_fields`	`id`
Kept alongside for filtering and display	`metadata_fields`	`status`, `priority`, `updated_at`

So one ticket row becomes a record built from its title and description, reopenable by its id, with its status and priority carried along. Columns you don't list — an internal flag, a foreign key — are simply ignored, and a table you give no text_fields still shows up to browse and grep but has nothing to search by meaning.

You don't always have to set this up. The chat and SaaS sources ship sensible defaults — a Jira issue indexes its summary, description, and comments; a Slack thread indexes each message as "who: what" — so they work the moment you add them, and the field lists are only there to override. Databases are the case you normally configure, since only you know which columns are worth searching.

And some things deliberately aren't searched:

Object	What happens
code and documents (PDF, Markdown, docx…)	converted to text and embedded — fully searchable
each table's schema	a small summary, so you can search the structure ("which table has an email column?")
an image	embedded only if image descriptions are turned on
raw structured text — `.json`, `.csv`, `.yaml`, `.log`	you can browse and `grep` it, but it isn't embedded for semantic search
a binary, or anything marked not-indexable	kept for browsing only

One command reopens any result¶

A search result is only useful if you can reopen exactly what it found — a few lines of a file, or one row of a table. So every result comes back with two things: which object it's in, and a small locator for the spot inside it. The same cat reopens either one:

# result in a code file → the locator is a line range
mfs cat file://local/repo/engine.py --locator '{"lines":[42,78]}'

# result in a database table → the locator is the row's key
mfs cat postgres://prod/public/tickets/rows.jsonl --locator '{"id":12345}'

You never have to learn a different way to address each source. A result from anywhere is something you reopen with the same command.

Every source reports changes the same way¶

When you re-run mfs add, MFS redoes only what actually changed. Each kind of source works out "what changed" in its own way — but they all report it back the same way (added, changed, removed), so the update logic is written once:

Source family	What changed is found by	For example
Files & blobs — file, S3, Drive, web, repo code	re-scanning and comparing content hashes	edit `README.md` and its hash changes, so it's re-indexed; delete it and the full re-scan notices it's gone
Databases & warehouses — Postgres, Mongo, BigQuery…	reading an `updated_at`-style column	12 rows change in a million-row table, and only those 12 are re-pulled
Messages & mail — Slack, Gmail, Feishu…	continuing past the newest message it already has	30 new messages since last time, so only those are fetched; old threads are left alone
Issues, CRM & docs — Jira, Linear, Notion…	checking an "updated" timestamp	reopen a Jira ticket and it's re-indexed on the next sync

What's built once, and what a new source adds¶

Everything above is what makes the last part true: adding a new source is a small job, because all the hard parts already exist. Someone writing a connector for a new tool doesn't build a search engine — they write a thin adapter that answers a few questions about their source, and get everything else for free:

Built once, in MFS — a new source just reuses it	What a new source has to provide
splitting, embedding, image descriptions, summaries	how to connect and sign in
the search index and ranking	the tree layout — what its folders and objects look like
`ls` / `cat` / `grep` / `search` and the HTTP API	which kind each object is
the job queue, caching, crash recovery, deletions	how to read an object, and how to tell what changed

In practice that adapter is a handful of small functions — often a few hundred lines — and the source is then searchable and browsable like everything else. There are a couple of optional extras for sources that can do better, and MFS falls back to its general path when they can't:

a database can answer a grep with a SQL WHERE clause instead of scanning;
a source with a real timestamp can support mfs add --since 2026-01-01.

That's the whole reason for pinning everything to one model: the common, expensive part is written once, and the list of supported sources grows just by adding thin adapters. It's the same trade that lets a shell drive thousands of programs through a few file commands — make the shared part the same, keep the differences small. See Design philosophy, Architecture, and Connectors.