Claude Code Plugin¶
Automatic persistent memory for Claude Code. No commands to learn, no manual saving -- just install the plugin and Claude remembers what you worked on across sessions.
The plugin is built entirely on Claude Code's own primitives: Hooks for lifecycle events, CLI for tool access, and Agent for autonomous decisions. No MCP servers, no sidecar services, no extra network round-trips. Everything runs locally as shell scripts and a Python CLI.
What It Does¶
When you launch Claude Code with the memsearch plugin:
- Every session is remembered. When Claude finishes responding, a Haiku model summarizes the exchange and appends it to a daily markdown log (
YYYY-MM-DD.md). - Every prompt triggers recall. Before Claude sees your message, a semantic search runs against all past memories and injects the most relevant ones into context.
- No manual intervention. You never need to run a command, tag a memory, or tell Claude to "remember this". The hooks handle everything.
The result: Claude has a persistent, searchable, ever-growing memory -- without you lifting a finger.
Quick Start¶
Install from Marketplace (recommended)¶
# 1. Install the memsearch CLI
pip install memsearch
# 2. (Optional) Initialize config
memsearch config init
# 3. In Claude Code, add the marketplace and install the plugin
/plugin marketplace add zilliztech/memsearch
/plugin install memsearch
# 4. Have a conversation, then exit. Check your memories:
cat .memsearch/memory/$(date +%Y-%m-%d).md
# 5. Start a new session -- Claude automatically remembers!
Development mode¶
For contributors or if you want to modify the plugin:
git clone https://github.com/zilliztech/memsearch.git
pip install memsearch
claude --plugin-dir ./memsearch/ccplugin
How It Works¶
The plugin hooks into 4 Claude Code lifecycle events. A singleton memsearch watch process runs in the background, keeping the vector index in sync with markdown files as they change.
Lifecycle Diagram¶
stateDiagram-v2
[*] --> SessionStart
SessionStart --> WatchRunning: start memsearch watch
SessionStart --> InjectRecent: load recent memories
state WatchRunning {
[*] --> Watching
Watching --> Reindex: file changed
Reindex --> Watching: done
}
InjectRecent --> Prompting
state Prompting {
[*] --> UserInput
UserInput --> SemanticSearch: memsearch search
SemanticSearch --> InjectMemories: top-k results
InjectMemories --> ClaudeThinks
ClaudeThinks --> Summary: haiku summarize
Summary --> WriteMD: write YYYY-MM-DD.md
WriteMD --> UserInput: next turn
}
Prompting --> SessionEnd: user exits
SessionEnd --> StopWatch: stop memsearch watch
StopWatch --> [*]
Hook Summary¶
The plugin defines exactly 4 hooks, all declared in hooks/hooks.json:
| Hook | Type | Async | Timeout | What It Does |
|---|---|---|---|---|
| SessionStart | command | no | 10s | Start memsearch watch singleton, write session heading to today's .md, inject recent memories and Memory Tools instructions via additionalContext |
| UserPromptSubmit | command | no | 15s | Semantic search on user prompt (skip if < 10 chars), inject top-3 relevant memories with chunk_hash IDs via additionalContext |
| Stop | command | yes | 120s | Parse transcript with parse-transcript.sh, call claude -p --model haiku to summarize, append summary with session/turn anchors to daily .md |
| SessionEnd | command | no | 10s | Stop the memsearch watch background process (cleanup) |
What Each Hook Does¶
SessionStart¶
Fires once when a Claude Code session begins. This hook:
- Starts the watcher. Launches
memsearch watch .memsearch/memory/as a singleton background process (PID file lock prevents duplicates). The watcher monitors markdown files and auto-re-indexes on changes with a 1500ms debounce. - Writes a session heading. Appends
## Session HH:MMto today's memory file (.memsearch/memory/YYYY-MM-DD.md), creating the file if it does not exist. - Injects recent memories. Reads the last 30 lines from the 2 most recent daily logs. If memsearch is available, also runs
memsearch search "recent session summary" --top-k 3for semantic results. - Injects Memory Tools instructions. Tells Claude about
memsearch expandandmemsearch transcriptcommands for progressive disclosure (L2 and L3).
All of this is returned as additionalContext in the hook output JSON.
UserPromptSubmit¶
Fires on every user prompt before Claude processes it. This hook:
- Extracts the prompt from the hook input JSON.
- Skips short prompts (under 10 characters) -- greetings and single words are not worth searching.
- Runs semantic search. Calls
memsearch search "$PROMPT" --top-k 3 --json-output. - Formats results as a compact index with source file, heading, a 200-character preview, and the
chunk_hashfor each result. - Injects as context. Returns formatted results under a
## Relevant Memoriesheading viaadditionalContext.
This is the key mechanism that makes memory recall automatic -- Claude does not need to decide to search, it simply receives relevant context on every prompt.
Stop¶
Fires after Claude finishes each response. Runs asynchronously so it does not block the user. This hook:
- Guards against recursion. Checks
stop_hook_activeto prevent infinite loops (since the hook itself callsclaude -p). - Validates the transcript. Skips if the transcript file is missing or has fewer than 3 lines.
- Parses the transcript. Calls
parse-transcript.sh, which:- Takes the last 200 lines of the JSONL transcript
- Truncates user/assistant text to 500 characters each
- Extracts tool names with input summaries
- Skips
file-history-snapshotentries
- Summarizes with Haiku. Pipes the parsed transcript to
claude -p --model haiku --no-session-persistencewith a system prompt that requests 3-8 bullet points focusing on decisions, problems solved, code changes, and key findings. - Appends to daily log. Writes a
### HH:MMsub-heading with an HTML comment anchor containing session ID, turn UUID, and transcript path. The watcher detects the file change and auto-indexes the new content.
SessionEnd¶
Fires when the user exits Claude Code. Simply calls stop_watch to kill the memsearch watch process and clean up the PID file, including a sweep for any orphaned processes.
Progressive Disclosure¶
Memory retrieval uses a three-layer progressive disclosure model. Layer 1 is fully automatic; layers 2 and 3 are available on demand when Claude needs more context.
graph TD
L1["L1: Auto-injected<br/>(UserPromptSubmit hook)"] --> L2["L2: On-demand expand<br/>(memsearch expand)"]
L2 --> L3["L3: Transcript drill-down<br/>(memsearch transcript)"]
style L1 fill:#2a3a5c,stroke:#6ba3d6,color:#a8b2c1
style L2 fill:#2a3a5c,stroke:#e0976b,color:#a8b2c1
style L3 fill:#2a3a5c,stroke:#d66b6b,color:#a8b2c1
L1: Auto-Injected (Automatic)¶
On every user prompt, the UserPromptSubmit hook injects the top-3 semantic search results. Each result includes:
- Source file and heading
- A 200-character content preview
- The
chunk_hashidentifier
This happens transparently -- no action from Claude or the user is required.
Example injection:
## Relevant Memories
- [memory/2026-02-08.md:Session 14:30] Implemented caching system with Redis L1
and in-process LRU L2. Fixed N+1 query issue in order-service using selectinload...
`chunk_hash: a1b2c3d4e5f6`
L2: On-Demand Expand¶
When an L1 preview is not enough, Claude can run the memsearch expand command to retrieve the full markdown section surrounding a chunk:
# Show full section
memsearch expand a1b2c3d4e5f6
# JSON output with anchor metadata (for programmatic L3 drill-down)
memsearch expand a1b2c3d4e5f6 --json-output
# Show N lines of context before/after instead of the full section
memsearch expand a1b2c3d4e5f6 --lines 10
The output includes the full markdown content plus the embedded anchor metadata, which links to the original session transcript.
L3: Transcript Drill-Down¶
When Claude needs the original conversation verbatim -- for instance, to recall exact code snippets or error messages -- it can drill into the JSONL transcript:
# Show an index of all turns in a session
memsearch transcript /path/to/session.jsonl
# Show context around a specific turn (prefix match on UUID)
memsearch transcript /path/to/session.jsonl --turn bffc0c1b --context 3
# JSON output for programmatic use
memsearch transcript /path/to/session.jsonl --turn bffc0c1b --json-output
Session Anchors¶
Each memory summary includes an HTML comment anchor that links the chunk back to its source session, enabling the L2-to-L3 drill-down:
### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metrics
The anchor contains three fields:
| Field | Description |
|---|---|
session |
Claude Code session ID (also the JSONL filename without extension) |
turn |
UUID of the last user turn in the session |
transcript |
Absolute path to the JSONL transcript file |
Claude extracts these fields from memsearch expand --json-output and uses them to call memsearch transcript for L3 access.
Memory Storage¶
All memories live in .memsearch/memory/ inside your project directory.
Directory Structure¶
your-project/
├── .memsearch/
│ ├── .watch.pid <-- singleton watcher PID file
│ └── memory/
│ ├── 2026-02-07.md <-- daily memory log
│ ├── 2026-02-08.md
│ └── 2026-02-09.md <-- today's session summaries
└── ... (your project files)
Example Memory File¶
A typical daily memory file (2026-02-09.md) looks like this:
## Session 14:30
### 14:30
<!-- session:abc123def turn:ghi789jkl transcript:/home/user/.claude/projects/.../abc123def.jsonl -->
- Implemented caching system with Redis L1 and in-process LRU L2
- Fixed N+1 query issue in order-service using selectinload
- Decided to use Prometheus counters for cache hit/miss metrics
## Session 17:45
### 17:45
<!-- session:mno456pqr turn:stu012vwx transcript:/home/user/.claude/projects/.../mno456pqr.jsonl -->
- Debugged React hydration mismatch caused by Date.now() during SSR
- Added comprehensive test suite for the caching middleware
- Reviewed PR #42: approved with minor naming suggestions
Each file accumulates all sessions from that day. The format is plain markdown -- human-readable, grep-able, and git-friendly.
Markdown Is the Source of Truth¶
The Milvus vector index is a derived cache that can be rebuilt at any time:
This means:
- No data loss. Even if Milvus is corrupted or deleted, your memories are safe in
.mdfiles. - Portable. Copy
.memsearch/memory/to another machine and rebuild the index. - Auditable. You can read, edit, or delete any memory entry with a text editor.
- Git-friendly. Commit your memory files to version control for a complete project history.
Comparison with claude-mem¶
claude-mem is another memory solution for Claude Code. Here is a detailed comparison:
| Aspect | memsearch | claude-mem |
|---|---|---|
| Architecture | 4 shell hooks + 1 watch process | Node.js/Bun worker service + Express server + React UI |
| Integration | Native hooks + CLI (zero IPC overhead) | MCP server (stdio); tool definitions permanently consume context window |
| Memory recall | Automatic -- semantic search on every prompt via hook | Agent-driven -- Claude must explicitly call MCP search tool |
| Progressive disclosure | 3-layer, auto-triggered: hook injects top-k (L1), then expand (L2), then transcript (L3) |
3-layer, all manual: search, timeline, get_observations all require explicit tool calls |
| Session summary cost | 1 claude -p --model haiku call, runs async |
Observation on every tool use + session summary (more API calls at scale) |
| Vector backend | Milvus -- hybrid search (dense + BM25), scales from embedded to distributed cluster | Chroma -- dense only, limited scaling path |
| Storage format | Transparent .md files -- human-readable, git-friendly |
Opaque SQLite + Chroma binary |
| Index sync | memsearch watch singleton -- auto-debounced background sync |
Automatic observation writes, but no unified background sync |
| Data portability | Copy .memsearch/memory/*.md and rebuild |
Export from SQLite + Chroma |
| Runtime dependency | Python (memsearch CLI) + claude CLI |
Node.js + Bun + MCP runtime |
| Context window cost | Minimal -- hook injects only top-k results as plain text | MCP tool definitions always loaded + each tool call/result consumes context |
| Cost per session | ~1 Haiku call for summary | Multiple Claude API calls for observation compression |
The Key Insight: Automatic vs. Agent-Driven Recall¶
The fundamental architectural difference is when memory recall happens.
memsearch injects relevant memories into every prompt via hooks. Claude does not need to decide whether to search -- it simply receives relevant context before processing each message. This means memories are never missed due to Claude forgetting to look them up. Progressive disclosure starts automatically at L1 (the hook injects top-k results), and only deeper layers (L2 expand, L3 transcript) require explicit CLI calls from the agent.
claude-mem gives Claude MCP tools to search, explore timelines, and fetch observations. All three layers require Claude to proactively decide to invoke them. While this is more flexible (Claude controls when and what to recall), it means memories are only retrieved when Claude thinks to ask. In practice, Claude often does not call the search tool unless the conversation explicitly references past work -- which means relevant context can be silently lost.
The difference is analogous to push vs. pull: memsearch pushes memories to Claude on every turn, while claude-mem requires Claude to pull them on demand.
Comparison with Claude's Native Memory¶
Claude Code has built-in memory features: CLAUDE.md files and auto-memory (the /memory command). Here is why memsearch provides a stronger solution:
| Aspect | Claude Native Memory | memsearch |
|---|---|---|
| Storage | Single CLAUDE.md file (or per-project) |
Unlimited daily .md files with full history |
| Recall mechanism | File is loaded at session start (no search) | Semantic search on every prompt (embedding-based) |
| Granularity | One monolithic file, manually edited | Per-session bullet points, automatically generated |
| Search | None -- Claude reads the whole file or nothing | Hybrid semantic search (dense + BM25) returning top-k relevant chunks |
| History depth | Limited to what fits in one file | Unlimited -- every session is logged, every entry is searchable |
| Automatic capture | /memory command requires manual intervention |
Fully automatic -- hooks capture every session |
| Progressive disclosure | None -- entire file is loaded into context | 3-layer model (L1 auto-inject, L2 expand, L3 transcript) minimizes context usage |
| Deduplication | Manual -- user must avoid adding duplicates | SHA-256 content hashing prevents duplicate embeddings |
| Portability | Tied to Claude Code's internal format | Standard markdown files, usable with any tool |
Why This Matters¶
CLAUDE.md is a blunt instrument: it loads the entire file into context at session start, regardless of relevance. As the file grows, it wastes context window on irrelevant information and eventually hits size limits. There is no search -- Claude cannot selectively recall a specific decision from three weeks ago.
memsearch solves this with semantic search and progressive disclosure. Instead of loading everything, it injects only the top-k most relevant memories for each specific prompt. History can grow indefinitely without degrading performance, because the vector index handles the filtering. And the three-layer model means Claude starts with lightweight previews and only drills deeper when needed, keeping context window usage minimal.
Plugin Files¶
The plugin lives in the ccplugin/ directory at the root of the memsearch repository:
ccplugin/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest (name, version, description)
└── hooks/
├── hooks.json # Hook definitions (4 lifecycle hooks)
├── common.sh # Shared setup: env, PATH, memsearch detection, watch management
├── session-start.sh # Start watch + write session heading + inject memories & tools
├── user-prompt-submit.sh # Semantic search on prompt -> inject memories with chunk_hash
├── stop.sh # Parse transcript -> haiku summary -> append to daily .md
├── parse-transcript.sh # Deterministic JSONL-to-text parser with truncation
└── session-end.sh # Stop watch process (cleanup)
File Descriptions¶
| File | Purpose |
|---|---|
plugin.json |
Claude Code plugin manifest. Declares the plugin name (memsearch), version, and description. |
hooks.json |
Defines the 4 lifecycle hooks (SessionStart, UserPromptSubmit, Stop, SessionEnd) with their types, timeouts, and async flags. |
common.sh |
Shared shell library sourced by all hooks. Handles stdin JSON parsing, PATH setup, memsearch binary detection (prefers PATH, falls back to uv run), memory directory management, and the watch singleton (start/stop with PID file and orphan cleanup). |
session-start.sh |
SessionStart hook implementation. Starts the watcher, writes the session heading, reads recent memory files, runs a semantic search for recent context, and injects Memory Tools instructions. |
user-prompt-submit.sh |
UserPromptSubmit hook implementation. Extracts the user prompt, runs memsearch search with --top-k 3 --json-output, and formats results with chunk_hash for progressive disclosure. |
stop.sh |
Stop hook implementation. Extracts the transcript path, validates it, delegates parsing to parse-transcript.sh, calls Haiku for summarization, and appends the result with session anchors to the daily memory file. |
parse-transcript.sh |
Standalone transcript parser. Processes the last 200 lines of a JSONL transcript, truncates content to 500 characters, extracts tool call summaries, and skips file-history-snapshot entries. Used by stop.sh. |
session-end.sh |
SessionEnd hook implementation. Calls stop_watch to terminate the background watcher and clean up. |
The memsearch CLI¶
The plugin is built entirely on the memsearch CLI -- every hook is a shell script calling memsearch subcommands. Here are the commands most relevant to the plugin:
| Command | Used By | What It Does |
|---|---|---|
search <query> |
UserPromptSubmit hook | Semantic search over indexed memories (--top-k for result count, --json-output for JSON) |
watch <paths> |
SessionStart hook | Background watcher that auto-indexes on file changes (1500ms debounce) |
index <paths> |
Manual / rebuild | One-shot index of markdown files (--force to re-index all) |
expand <chunk_hash> |
Agent (L2 disclosure) | Show full markdown section around a chunk, with anchor metadata |
transcript <jsonl> |
Agent (L3 disclosure) | Parse Claude Code JSONL transcript into readable conversation turns |
config init |
Quick Start | Interactive config wizard for first-time setup |
stats |
Manual | Show index statistics (collection size, chunk count) |
reset |
Manual | Drop all indexed data (requires --yes to confirm) |
For the full CLI reference, see the CLI Reference page.