Skip to content

S3 (s3)

The s3 connector indexes objects under a bucket prefix. It's S3-compatible, so the same connector covers AWS S3, Cloudflare R2, Google Cloud Storage (S3 interop), and MinIO — the only difference is endpoint_url.

How MFS sees it

The tree mirrors object keys under the configured bucket and prefix:

s3://acme-docs/
└── engineering/
    └── rfc/
        ├── rfc-001.md      document
        └── rfc-002.pdf     document

Objects are classified by extension exactly like the file connector: documents and code are converted and embedded; structured text is browse/grep only; other types are browse/export only.

Credentials

Pick the path for your provider:

  • AWS S3: an IAM access key (or STS temporary credentials). boto3 reads AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY from the environment automatically, so you can omit them from the TOML. Minimum IAM policy:

    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"]
      }]
    }
    
  • Cloudflare R2: an R2 API token with Object Read; set endpoint_url = "https://<account-id>.r2.cloudflarestorage.com" and region = "auto".

  • GCS (S3 interop): an HMAC key; endpoint_url = "https://storage.googleapis.com".
  • MinIO: the service's access/secret key; endpoint_url of your MinIO URL.

Configuration

bucket = "acme-docs"
prefix = "engineering/rfc/"
region = "us-west-2"
access_key_id = "env:AWS_ACCESS_KEY_ID"
secret_access_key = "env:AWS_SECRET_ACCESS_KEY"
# endpoint_url = "https://<account-id>.r2.cloudflarestorage.com"   # R2/GCS/MinIO

Sync and freshness

The connector uses each object's etag as its cursor, so re-syncs only re-process changed objects; deletions are caught by full_scan. Versioned buckets expose only the latest version.

Search and browse

mfs add s3://acme-docs --config ./s3.toml

mfs search "retention policy" s3://acme-docs/engineering/rfc/
mfs cat s3://acme-docs/engineering/rfc/rfc-001.md --range 1:80
mfs export s3://acme-docs/engineering/rfc/rfc-001.pdf /tmp/rfc-001.pdf

Pitfalls

  • prefix is exact — use a trailing slash when you mean a directory-like prefix.
  • IAM must allow both ListBucket and GetObject for the scoped bucket/prefix.
  • Very large PDFs or Office files can be expensive to convert.