Claude Code Plugin¶

Automatic persistent memory for Claude Code. No commands to learn, no manual saving -- just install the plugin and Claude remembers what you worked on across sessions.

The plugin is built entirely on Claude Code's own primitives: Hooks for lifecycle events, CLI for tool access, and Agent for autonomous decisions. No MCP servers, no sidecar services, no extra network round-trips. Everything runs locally as shell scripts and a Python CLI.

What It Does¶

When you launch Claude Code with the memsearch plugin:

Every session is remembered. When Claude finishes responding, a Haiku model summarizes the exchange and appends it to a daily markdown log (YYYY-MM-DD.md).
Every prompt triggers recall. Before Claude sees your message, a semantic search runs against all past memories and injects the most relevant ones into context.
No manual intervention. You never need to run a command, tag a memory, or tell Claude to "remember this". The hooks handle everything.

The result: Claude has a persistent, searchable, ever-growing memory -- without you lifting a finger.

Quick Start¶

Install from Marketplace (recommended)¶

# 1. Install the memsearch CLI
pip install memsearch

# 2. (Optional) Initialize config
memsearch config init

# 3. In Claude Code, add the marketplace and install the plugin
/plugin marketplace add zilliztech/memsearch
/plugin install memsearch

# 4. Have a conversation, then exit. Check your memories:
cat .memsearch/memory/$(date +%Y-%m-%d).md

# 5. Start a new session -- Claude automatically remembers!

Development mode¶

For contributors or if you want to modify the plugin:

git clone https://github.com/zilliztech/memsearch.git
pip install memsearch
claude --plugin-dir ./memsearch/ccplugin

How It Works¶

The plugin hooks into 4 Claude Code lifecycle events. A singleton memsearch watch process runs in the background, keeping the vector index in sync with markdown files as they change.

Lifecycle Diagram¶

stateDiagram-v2
    [*] --> SessionStart
    SessionStart --> WatchRunning: start memsearch watch
    SessionStart --> InjectRecent: load recent memories

    state WatchRunning {
        [*] --> Watching
        Watching --> Reindex: file changed
        Reindex --> Watching: done
    }

    InjectRecent --> Prompting

    state Prompting {
        [*] --> UserInput
        UserInput --> SemanticSearch: memsearch search
        SemanticSearch --> InjectMemories: top-k results
        InjectMemories --> ClaudeThinks
        ClaudeThinks --> Summary: haiku summarize
        Summary --> WriteMD: write YYYY-MM-DD.md
        WriteMD --> UserInput: next turn
    }

    Prompting --> SessionEnd: user exits
    SessionEnd --> StopWatch: stop memsearch watch
    StopWatch --> [*]

Hook Summary¶

The plugin defines exactly 4 hooks, all declared in hooks/hooks.json:

Hook	Type	Async	Timeout	What It Does
SessionStart	command	no	10s	Start `memsearch watch` singleton, write session heading to today's `.md`, inject recent memories and Memory Tools instructions via `additionalContext`
UserPromptSubmit	command	no	15s	Semantic search on user prompt (skip if < 10 chars), inject top-3 relevant memories with `chunk_hash` IDs via `additionalContext`
Stop	command	yes	120s	Parse transcript with `parse-transcript.sh`, call `claude -p --model haiku` to summarize, append summary with session/turn anchors to daily `.md`
SessionEnd	command	no	10s	Stop the `memsearch watch` background process (cleanup)

What Each Hook Does¶

SessionStart¶

Fires once when a Claude Code session begins. This hook:

Starts the watcher. Launches memsearch watch .memsearch/memory/ as a singleton background process (PID file lock prevents duplicates). The watcher monitors markdown files and auto-re-indexes on changes with a 1500ms debounce.
Writes a session heading. Appends ## Session HH:MM to today's memory file (.memsearch/memory/YYYY-MM-DD.md), creating the file if it does not exist.
Injects recent memories. Reads the last 30 lines from the 2 most recent daily logs. If memsearch is available, also runs memsearch search "recent session summary" --top-k 3 for semantic results.
Injects Memory Tools instructions. Tells Claude about memsearch expand and memsearch transcript commands for progressive disclosure (L2 and L3).

All of this is returned as additionalContext in the hook output JSON.

UserPromptSubmit¶

Fires on every user prompt before Claude processes it. This hook:

Extracts the prompt from the hook input JSON.
Skips short prompts (under 10 characters) -- greetings and single words are not worth searching.
Runs semantic search. Calls memsearch search "$PROMPT" --top-k 3 --json-output.
Formats results as a compact index with source file, heading, a 200-character preview, and the chunk_hash for each result.
Injects as context. Returns formatted results under a ## Relevant Memories heading via additionalContext.

This is the key mechanism that makes memory recall automatic -- Claude does not need to decide to search, it simply receives relevant context on every prompt.

Stop¶

Fires after Claude finishes each response. Runs asynchronously so it does not block the user. This hook:

Guards against recursion. Checks stop_hook_active to prevent infinite loops (since the hook itself calls claude -p).
Validates the transcript. Skips if the transcript file is missing or has fewer than 3 lines.
Parses the transcript. Calls parse-transcript.sh, which:
- Takes the last 200 lines of the JSONL transcript
- Truncates user/assistant text to 500 characters each
- Extracts tool names with input summaries
- Skips file-history-snapshot entries
Summarizes with Haiku. Pipes the parsed transcript to claude -p --model haiku --no-session-persistence with a system prompt that requests 3-8 bullet points focusing on decisions, problems solved, code changes, and key findings.
Appends to daily log. Writes a ### HH:MM sub-heading with an HTML comment anchor containing session ID, turn UUID, and transcript path. The watcher detects the file change and auto-indexes the new content.

SessionEnd¶

Fires when the user exits Claude Code. Simply calls stop_watch to kill the memsearch watch process and clean up the PID file, including a sweep for any orphaned processes.

Progressive Disclosure¶

Memory retrieval uses a three-layer progressive disclosure model. Layer 1 is fully automatic; layers 2 and 3 are available on demand when Claude needs more context.

graph TD
    L1["L1: Auto-injected<br/>(UserPromptSubmit hook)"] --> L2["L2: On-demand expand<br/>(memsearch expand)"]
    L2 --> L3["L3: Transcript drill-down<br/>(memsearch transcript)"]

    style L1 fill:#2a3a5c,stroke:#6ba3d6,color:#a8b2c1
    style L2 fill:#2a3a5c,stroke:#e0976b,color:#a8b2c1
    style L3 fill:#2a3a5c,stroke:#d66b6b,color:#a8b2c1

L1: Auto-Injected (Automatic)¶

On every user prompt, the UserPromptSubmit hook injects the top-3 semantic search results. Each result includes:

Source file and heading
A 200-character content preview
The chunk_hash identifier

This happens transparently -- no action from Claude or the user is required.

Example injection:

## Relevant Memories
- [memory/2026-02-08.md:Session 14:30]  Implemented caching system with Redis L1
  and in-process LRU L2. Fixed N+1 query issue in order-service using selectinload...
  `chunk_hash: a1b2c3d4e5f6`

L2: On-Demand Expand¶

When an L1 preview is not enough, Claude can run the memsearch expand command to retrieve the full markdown section surrounding a chunk:

# Show full section
memsearch expand a1b2c3d4e5f6

# JSON output with anchor metadata (for programmatic L3 drill-down)
memsearch expand a1b2c3d4e5f6 --json-output

# Show N lines of context before/after instead of the full section
memsearch expand a1b2c3d4e5f6 --lines 10

The output includes the full markdown content plus the embedded anchor metadata, which links to the original session transcript.

L3: Transcript Drill-Down¶

When Claude needs the original conversation verbatim -- for instance, to recall exact code snippets or error messages -- it can drill into the JSONL transcript:

# Show an index of all turns in a session
memsearch transcript /path/to/session.jsonl

# Show context around a specific turn (prefix match on UUID)
memsearch transcript /path/to/session.jsonl --turn bffc0c1b --context 3

# JSON output for programmatic use
memsearch transcript /path/to/session.jsonl --turn bffc0c1b --json-output

Session Anchors¶

Each memory summary includes an HTML comment anchor that links the chunk back to its source session, enabling the L2-to-L3 drill-down:

### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metrics

The anchor contains three fields:

Field	Description
`session`	Claude Code session ID (also the JSONL filename without extension)
`turn`	UUID of the last user turn in the session
`transcript`	Absolute path to the JSONL transcript file

Claude extracts these fields from memsearch expand --json-output and uses them to call memsearch transcript for L3 access.

Memory Storage¶

All memories live in .memsearch/memory/ inside your project directory.

Directory Structure¶

your-project/
├── .memsearch/
│   ├── .watch.pid            <-- singleton watcher PID file
│   └── memory/
│       ├── 2026-02-07.md     <-- daily memory log
│       ├── 2026-02-08.md
│       └── 2026-02-09.md     <-- today's session summaries
└── ... (your project files)

Example Memory File¶

A typical daily memory file (2026-02-09.md) looks like this:

## Session 14:30

### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metrics

## Session 17:45

### 17:45
<!-- session:mno456pqr turn:stu012vwx transcript:/home/user/.claude/projects/.../mno456pqr.jsonl -->
- Debugged React hydration mismatch caused by Date.now() during SSR
- Added comprehensive test suite for the caching middleware
- Reviewed PR #42: approved with minor naming suggestions

Each file accumulates all sessions from that day. The format is plain markdown -- human-readable, grep-able, and git-friendly.

Markdown Is the Source of Truth¶

The Milvus vector index is a derived cache that can be rebuilt at any time:

memsearch index .memsearch/memory/

This means:

No data loss. Even if Milvus is corrupted or deleted, your memories are safe in .md files.
Portable. Copy .memsearch/memory/ to another machine and rebuild the index.
Auditable. You can read, edit, or delete any memory entry with a text editor.
Git-friendly. Commit your memory files to version control for a complete project history.

Comparison with claude-mem¶

claude-mem is another memory solution for Claude Code. Here is a detailed comparison:

Aspect	memsearch	claude-mem
Architecture	4 shell hooks + 1 watch process	Node.js/Bun worker service + Express server + React UI
Integration	Native hooks + CLI (zero IPC overhead)	MCP server (stdio); tool definitions permanently consume context window
Memory recall	Automatic -- semantic search on every prompt via hook	Agent-driven -- Claude must explicitly call MCP `search` tool
Progressive disclosure	3-layer, auto-triggered: hook injects top-k (L1), then `expand` (L2), then `transcript` (L3)	3-layer, all manual: `search`, `timeline`, `get_observations` all require explicit tool calls
Session summary cost	1 `claude -p --model haiku` call, runs async	Observation on every tool use + session summary (more API calls at scale)
Vector backend	Milvus -- hybrid search (dense + BM25), scales from embedded to distributed cluster	Chroma -- dense only, limited scaling path
Storage format	Transparent `.md` files -- human-readable, git-friendly	Opaque SQLite + Chroma binary
Index sync	`memsearch watch` singleton -- auto-debounced background sync	Automatic observation writes, but no unified background sync
Data portability	Copy `.memsearch/memory/*.md` and rebuild	Export from SQLite + Chroma
Runtime dependency	Python (`memsearch` CLI) + `claude` CLI	Node.js + Bun + MCP runtime
Context window cost	Minimal -- hook injects only top-k results as plain text	MCP tool definitions always loaded + each tool call/result consumes context
Cost per session	~1 Haiku call for summary	Multiple Claude API calls for observation compression

The Key Insight: Automatic vs. Agent-Driven Recall¶

The fundamental architectural difference is when memory recall happens.

memsearch injects relevant memories into every prompt via hooks. Claude does not need to decide whether to search -- it simply receives relevant context before processing each message. This means memories are never missed due to Claude forgetting to look them up. Progressive disclosure starts automatically at L1 (the hook injects top-k results), and only deeper layers (L2 expand, L3 transcript) require explicit CLI calls from the agent.

claude-mem gives Claude MCP tools to search, explore timelines, and fetch observations. All three layers require Claude to proactively decide to invoke them. While this is more flexible (Claude controls when and what to recall), it means memories are only retrieved when Claude thinks to ask. In practice, Claude often does not call the search tool unless the conversation explicitly references past work -- which means relevant context can be silently lost.

The difference is analogous to push vs. pull: memsearch pushes memories to Claude on every turn, while claude-mem requires Claude to pull them on demand.

Comparison with Claude's Native Memory¶

Claude Code has built-in memory features: CLAUDE.md files and auto-memory (the /memory command). Here is why memsearch provides a stronger solution:

Aspect	Claude Native Memory	memsearch
Storage	Single `CLAUDE.md` file (or per-project)	Unlimited daily `.md` files with full history
Recall mechanism	File is loaded at session start (no search)	Semantic search on every prompt (embedding-based)
Granularity	One monolithic file, manually edited	Per-session bullet points, automatically generated
Search	None -- Claude reads the whole file or nothing	Hybrid semantic search (dense + BM25) returning top-k relevant chunks
History depth	Limited to what fits in one file	Unlimited -- every session is logged, every entry is searchable
Automatic capture	`/memory` command requires manual intervention	Fully automatic -- hooks capture every session
Progressive disclosure	None -- entire file is loaded into context	3-layer model (L1 auto-inject, L2 expand, L3 transcript) minimizes context usage
Deduplication	Manual -- user must avoid adding duplicates	SHA-256 content hashing prevents duplicate embeddings
Portability	Tied to Claude Code's internal format	Standard markdown files, usable with any tool

Why This Matters¶

CLAUDE.md is a blunt instrument: it loads the entire file into context at session start, regardless of relevance. As the file grows, it wastes context window on irrelevant information and eventually hits size limits. There is no search -- Claude cannot selectively recall a specific decision from three weeks ago.

memsearch solves this with semantic search and progressive disclosure. Instead of loading everything, it injects only the top-k most relevant memories for each specific prompt. History can grow indefinitely without degrading performance, because the vector index handles the filtering. And the three-layer model means Claude starts with lightweight previews and only drills deeper when needed, keeping context window usage minimal.

Plugin Files¶

The plugin lives in the ccplugin/ directory at the root of the memsearch repository:

ccplugin/
├── .claude-plugin/
│   └── plugin.json              # Plugin manifest (name, version, description)
└── hooks/
    ├── hooks.json               # Hook definitions (4 lifecycle hooks)
    ├── common.sh                # Shared setup: env, PATH, memsearch detection, watch management
    ├── session-start.sh         # Start watch + write session heading + inject memories & tools
    ├── user-prompt-submit.sh    # Semantic search on prompt -> inject memories with chunk_hash
    ├── stop.sh                  # Parse transcript -> haiku summary -> append to daily .md
    ├── parse-transcript.sh      # Deterministic JSONL-to-text parser with truncation
    └── session-end.sh           # Stop watch process (cleanup)

File Descriptions¶

File	Purpose
`plugin.json`	Claude Code plugin manifest. Declares the plugin name (`memsearch`), version, and description.
`hooks.json`	Defines the 4 lifecycle hooks (SessionStart, UserPromptSubmit, Stop, SessionEnd) with their types, timeouts, and async flags.
`common.sh`	Shared shell library sourced by all hooks. Handles stdin JSON parsing, PATH setup, memsearch binary detection (prefers PATH, falls back to `uv run`), memory directory management, and the watch singleton (start/stop with PID file and orphan cleanup).
`session-start.sh`	SessionStart hook implementation. Starts the watcher, writes the session heading, reads recent memory files, runs a semantic search for recent context, and injects Memory Tools instructions.
`user-prompt-submit.sh`	UserPromptSubmit hook implementation. Extracts the user prompt, runs `memsearch search` with `--top-k 3 --json-output`, and formats results with `chunk_hash` for progressive disclosure.
`stop.sh`	Stop hook implementation. Extracts the transcript path, validates it, delegates parsing to `parse-transcript.sh`, calls Haiku for summarization, and appends the result with session anchors to the daily memory file.
`parse-transcript.sh`	Standalone transcript parser. Processes the last 200 lines of a JSONL transcript, truncates content to 500 characters, extracts tool call summaries, and skips file-history-snapshot entries. Used by `stop.sh`.
`session-end.sh`	SessionEnd hook implementation. Calls `stop_watch` to terminate the background watcher and clean up.

The `memsearch` CLI¶

The plugin is built entirely on the memsearch CLI -- every hook is a shell script calling memsearch subcommands. Here are the commands most relevant to the plugin:

Command	Used By	What It Does
`search <query>`	UserPromptSubmit hook	Semantic search over indexed memories (`--top-k` for result count, `--json-output` for JSON)
`watch <paths>`	SessionStart hook	Background watcher that auto-indexes on file changes (1500ms debounce)
`index <paths>`	Manual / rebuild	One-shot index of markdown files (`--force` to re-index all)
`expand <chunk_hash>`	Agent (L2 disclosure)	Show full markdown section around a chunk, with anchor metadata
`transcript <jsonl>`	Agent (L3 disclosure)	Parse Claude Code JSONL transcript into readable conversation turns
`config init`	Quick Start	Interactive config wizard for first-time setup
`stats`	Manual	Show index statistics (collection size, chunk count)
`reset`	Manual	Drop all indexed data (requires `--yes` to confirm)

For the full CLI reference, see the CLI Reference page.