← Back to all posts

Teaching Your AI Agent to Actually Remember Things

A comprehensive guide to fixing memory problems in OpenClaw (and similar AI agent frameworks)


You know that feeling when you explain something to someone, and the next day they look at you like you're speaking in tongues? Now imagine that someone is your AI assistant—the one you've been training for weeks to understand your projects, preferences, and workflows.

"Wait, what blog are we building again?"

Screams internally.

If you've spent any time with AI agents like OpenClaw, you've probably hit this wall. The agent forgets context between sessions. It loses track of decisions made yesterday. It asks you to re-explain things you've discussed a dozen times. It's like working with a very smart amnesiac.

This isn't a flaw in the AI model itself—it's an architectural challenge. And after researching the problem extensively (10 YouTube videos, multiple articles, and a lot of trial and error), we found solutions that actually work.

This guide covers two complementary approaches:

  1. Behavioral changes — Simple file-based systems that enforce memory discipline
  2. True-Recall Local — A real-time session capture system with semantic search

Both are free, run locally, and don't require any cloud services.


The Problem: Why AI Agents Forget

Before we fix anything, let's understand what's actually broken.

Three Failure Modes

1. Missed Writes

The AI decides what's important enough to save. There's no guarantee it will persist something you consider critical. Maybe it saves "user prefers dark mode" but forgets "we decided to use PostgreSQL instead of MySQL for the production database."

2. Missed Retrieval

Even when memories exist, the AI must choose to search for them. If it doesn't think to look, it won't find them. It's like having a filing cabinet you forget exists.

3. Compaction Loss

Long conversations get compressed to fit context limits. Details disappear mid-session. You're 45 minutes into explaining a complex architecture, and suddenly the AI asks, "So what are we building?"

As one researcher put it:

"Memory persistence and retrieval are optional behaviors controlled by prompts and model heuristics. There is simply no guarantee that information will be persisted or reloaded when needed."

The AI isn't stupid. The system just doesn't enforce remembering.


Part 1: The 5-Layer Memory Architecture

This is the "behavioral fix"—it doesn't require installing new software, just disciplined use of files that already exist (or that you'll create).

Think of it like giving Kermit a really good planner. He's still Kermit, but now he writes things down.

The Five Layers

LayerPurposeVolatility
1. Session ContextCurrent conversationHigh (compacted)
2. CONTEXT.mdWorking memoryMedium (daily updates)
3. Daily NotesRaw detailed recordLow (append-only)
4. Long-term MemoryDistilled knowledgeVery low
5. Semantic SearchActive retrievalPermanent

The key insight: Layer 1 is unreliable. Everything important needs to flow down to Layers 2-5.

Layer 2: The CONTEXT.md File

This is the game-changer. Create a file called CONTEXT.md (or memory/CONTEXT.md) that acts as your AI's "working memory."

# CONTEXT.md — Working Memory

**Last updated:** 2026-03-01 10:00 EST

---

## 🔴 In Progress
- Building user authentication system (JWT + refresh tokens)
- Debugging image upload timeout issue

## 🟡 Pending
- Waiting on Matt's approval for database migration
- DNS propagation for new subdomain (~24 hours)

## 📌 Recent Decisions
- 2026-03-01: Using PostgreSQL instead of MySQL (performance)
- 2026-02-28: Authentication will use httpOnly cookies

## 💬 Today's Discussions
- Discussed rate limiting strategy
- Reviewed PR #47 (approved with minor changes)

---

## Active Projects

| Project | Status | Notes |
|---------|--------|-------|
| Blog API | 🔴 Active | Auth system in progress |
| Dashboard | 🟡 Paused | Waiting on design mockups |

The emojis aren't just decorative—they create scannable structure:

  • 🔴 = I'm actively working on this
  • 🟡 = Blocked or waiting
  • 📌 = A decision was made (don't re-debate this)
  • 💬 = We talked about this today

The Iron Rule: Write Immediately

Here's where most memory systems fail: the AI waits too long to write things down.

EventActionWhen
User gives instructionsAdd to CONTEXT.md + daily logImmediately
Task completedRemove from CONTEXT.md, log completionImmediately
Decision madeAdd to 📌 Recent DecisionsImmediately
Waiting on somethingAdd to 🟡 PendingImmediately
Casual chatDon't write

Why immediately? Because compaction is coming. If the AI waits until "later" to save important context, later might never come. The conversation gets compressed, details vanish, and the AI cheerfully forgets everything.

Add this rule to your agent's instructions (AGENTS.md or equivalent):

## Write Discipline (IRON RULE)

Write **immediately** — compaction erases unwritten context:
- Instructions given → CONTEXT.md 🔴 + daily log
- Task completed → remove from CONTEXT.md + daily log  
- Decision made → CONTEXT.md 📌 + daily log
- Waiting on something → CONTEXT.md 🟡
- Casual chat → don't write

Layer 3: Daily Notes

Create dated files for raw logging:

memory/
├── 2026-02-28.md
├── 2026-03-01.md
└── CONTEXT.md

The daily notes capture everything in detail. CONTEXT.md is the distilled version of what's currently relevant.

Example daily note structure:

# 2026-03-01 (Saturday)

## Summary
Authentication system development. Resolved database timeout issues.

## Log

- **09:15** — Started JWT implementation. Using jose library.
- **10:30** — Hit CORS issue with refresh token endpoint. Fixed by adding credentials: 'include'.
- **14:00** — Matt approved database migration. Running in staging.
- **15:45** — Migration complete. All tests passing.

Layer 4: Long-term Memory (MEMORY.md)

This is the "graduated" knowledge—things important enough to persist forever.

# MEMORY.md - Long-Term Memory

**Last Pruned:** 2026-02-28

## Critical Rules
- Always use feature branches, never push to main
- Request PR review from Matt on all changes
- PostgreSQL for production, SQLite for local dev

## Infrastructure
- Production API: api.example.com
- Staging: staging-api.example.com
- Database: AWS RDS (PostgreSQL 15)

## Lessons Learned
- 2026-02-15: Timeouts were caused by missing connection pooling
- 2026-02-22: Image uploads need presigned URLs for S3

Most frameworks (including OpenClaw) have built-in semantic search. The key is actually using it.

Add to your agent instructions:

## Memory Recall (ENFORCE)

Before answering questions about prior work, decisions, or preferences:
1. Run memory_search first
2. Check daily notes for recent context
3. Don't guess. Don't say "I think we discussed..."

Pre-Compaction Memory Flush

This technique is pure gold. Before the context window gets compacted, the AI silently saves critical insights.

Add to your agent instructions:

## Pre-compaction Behavior

When context is getting long, silently save important context to CONTEXT.md before it's lost. Use NO_REPLY to avoid disrupting the conversation.

The AI monitors its own context usage and proactively writes things down before they disappear. Like a student taking notes before the professor erases the whiteboard.

Session Start Ritual

Every new session should begin with reading the memory files:

## Session Start (MANDATORY)

Before ANY first response, read:
1. `memory/CONTEXT.md` — what's in progress NOW
2. `memory/YYYY-MM-DD.md` — today's log (if exists)
3. `MEMORY.md` — long-term knowledge

This ensures the AI wakes up knowing what it's supposed to be working on, not with a cheerful "Hello! How can I help you today?" when you're mid-project.


Part 2: True-Recall Local

The behavioral fixes help a lot, but they still depend on the AI choosing to write things down. What if we captured everything automatically?

True-Recall Local is a daemon that watches your session files in real-time, generates embeddings for each turn, and stores them in a searchable vector database. Later, you (or your AI) can semantically search everything that was ever discussed.

It's like giving Miss Piggy a complete transcript of every conversation, indexed and searchable. "Moi said what about that database schema?"

Architecture

Session Files (.jsonl)
  Watcher Daemon (session_watcher.py)
  Clean content, strip metadata
  Generate embeddings (LM Studio / Ollama / OpenAI)
  Store in SQLite-vec
  Searchable via search_live.py

Prerequisites

You'll need:

  1. Python 3.10+ with these packages:

    • sqlite-vec — Vector search extension for SQLite
    • requests — HTTP client
  2. An embedding provider (choose one):

    • LM Studio (recommended, free, local) — Run text-embedding-nomic-embed-text-v1.5
    • Ollama — Run nomic-embed-text or similar
    • OpenAI API — Use text-embedding-3-small (costs money)
  3. SQLite — You probably already have this

Setup

Step 1: Install Dependencies

pip install sqlite-vec requests

Step 2: Set Up Your Embedding Provider

Option A: LM Studio (Recommended)

  1. Download LM Studio
  2. Download an embedding model (search for "nomic embed")
  3. Start the server on a port (e.g., 12344)
  4. Verify it works:
curl http://localhost:12344/v1/models

Option B: Ollama

ollama pull nomic-embed-text
ollama serve

Endpoint: http://localhost:11434/v1/embeddings

Option C: OpenAI API

Set your API key as an environment variable:

export OPENAI_API_KEY=sk-your-key-here

Step 3: Create the Database Tables

Find your OpenClaw SQLite database (usually ~/.openclaw/memory/main.sqlite) and run:

-- Metadata table
CREATE TABLE sessions_live (
  id TEXT PRIMARY KEY,
  session_id TEXT NOT NULL,
  role TEXT NOT NULL,           -- user/assistant
  content TEXT NOT NULL,
  timestamp TEXT NOT NULL,
  created_at INTEGER NOT NULL
);

-- Vector table (768 dims for nomic, adjust if using different model)
CREATE VIRTUAL TABLE sessions_live_vec USING vec0(
  id TEXT PRIMARY KEY,
  embedding FLOAT[768]
);

-- Indexes for performance
CREATE INDEX idx_sessions_live_timestamp ON sessions_live(timestamp);
CREATE INDEX idx_sessions_live_session ON sessions_live(session_id);

For OpenAI embeddings (1536 dimensions), change FLOAT[768] to FLOAT[1536].

Step 4: Create the Watcher Daemon

Save this as scripts/session_watcher.py:

#!/usr/bin/env python3
"""
Session Watcher — Real-time capture to SQLite-vec via embeddings
"""

import os
import sys
import json
import time
import re
import signal
import sqlite3
import hashlib
import requests
from pathlib import Path
from datetime import datetime, timezone
from typing import Dict, Any, Optional, List

# Configuration — adjust these for your setup
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://localhost:12344")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-nomic-embed-text-v1.5")
SESSIONS_DIR = Path(os.getenv("SESSIONS_DIR", Path.home() / ".openclaw/agents/main/sessions"))
SQLITE_PATH = Path(os.getenv("SQLITE_PATH", Path.home() / ".openclaw/memory/main.sqlite"))

# State
running = True
processed_positions: Dict[str, int] = {}


def signal_handler(signum, frame):
    global running
    print(f"\n[{datetime.now().strftime('%H:%M:%S')}] Shutting down...")
    running = False


def get_embedding(text: str) -> Optional[List[float]]:
    """Get embedding from local model"""
    try:
        response = requests.post(
            f"{EMBEDDING_URL}/v1/embeddings",
            json={"input": text, "model": EMBEDDING_MODEL},
            timeout=30
        )
        response.raise_for_status()
        return response.json()["data"][0]["embedding"]
    except Exception as e:
        print(f"[ERROR] Embedding failed: {e}", file=sys.stderr)
        return None


def clean_content(text: str) -> str:
    """Strip metadata, markdown, thinking tags for cleaner embeddings"""
    # Remove metadata JSON blocks
    text = re.sub(r'Conversation info \(untrusted metadata\):\s*```json\s*\{[\s\S]*?\}\s*```', '', text)
    
    # Remove System: prefix lines
    text = re.sub(r'System: \[\d{4}-\d{2}-\d{2} [^\]]+\][^\n]*\n?', '', text)
    
    # Remove thinking tags
    text = re.sub(r'\[thinking:[^\]]*\]', '', text)
    
    # Remove code blocks (keep the code for context though)
    # text = re.sub(r'```[\s\S]*?```', '', text)  # Uncomment to remove code
    
    # Remove markdown formatting but keep text
    text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
    text = re.sub(r'\*([^*]+)\*', r'\1', text)
    text = re.sub(r'`([^`]+)`', r'\1', text)
    
    # Clean whitespace
    text = re.sub(r'\n{3,}', '\n\n', text)
    text = re.sub(r'[ \t]+', ' ', text)
    
    return text.strip()


def parse_turn(line: str) -> Optional[Dict[str, Any]]:
    """Extract turn data from JSONL line"""
    try:
        entry = json.loads(line.strip())
    except json.JSONDecodeError:
        return None
    
    if entry.get('type') != 'message' or 'message' not in entry:
        return None
    
    msg = entry['message']
    role = msg.get('role')
    
    # Skip non-conversation roles
    if role not in ('user', 'assistant'):
        return None
    
    # Extract content
    content = ""
    if isinstance(msg.get('content'), list):
        for item in msg['content']:
            if isinstance(item, dict) and 'text' in item:
                content += item['text']
    elif isinstance(msg.get('content'), str):
        content = msg['content']
    
    if not content:
        return None
    
    # Clean and validate
    content = clean_content(content)
    if not content or len(content) < 10:
        return None
    
    # Truncate if too long (embedding models have limits)
    content = content[:2000]
    
    return {
        'role': role,
        'content': content,
        'timestamp': entry.get('timestamp', datetime.now(timezone.utc).isoformat())
    }


def get_db_connection() -> sqlite3.Connection:
    """Get SQLite connection with vec extension loaded"""
    import sqlite_vec
    
    db = sqlite3.connect(str(SQLITE_PATH))
    db.enable_load_extension(True)
    sqlite_vec.load(db)
    db.enable_load_extension(False)
    return db


def store_turn(db: sqlite3.Connection, turn_id: str, session_id: str, 
               turn: Dict[str, Any], embedding: List[float]) -> bool:
    """Store turn and embedding to database"""
    try:
        # Check if already exists
        existing = db.execute("SELECT 1 FROM sessions_live WHERE id = ?", (turn_id,)).fetchone()
        if existing:
            return False
        
        # Insert metadata
        db.execute("""
            INSERT INTO sessions_live 
            (id, session_id, role, content, timestamp, created_at)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (
            turn_id,
            session_id,
            turn['role'],
            turn['content'],
            turn['timestamp'],
            int(time.time())
        ))
        
        # Insert vector
        db.execute("""
            INSERT INTO sessions_live_vec (id, embedding)
            VALUES (?, ?)
        """, (turn_id, json.dumps(embedding)))
        
        db.commit()
        return True
    except Exception as e:
        print(f"[ERROR] DB insert failed: {e}", file=sys.stderr)
        return False


def get_latest_session() -> Optional[Path]:
    """Find most recently modified session file"""
    if not SESSIONS_DIR.exists():
        return None
    
    files = list(SESSIONS_DIR.glob("*.jsonl"))
    if not files:
        return None
    
    return max(files, key=lambda p: p.stat().st_mtime)


def process_session_file(db: sqlite3.Connection, session_file: Path) -> int:
    """Process new lines from a session file"""
    session_id = session_file.stem
    file_key = str(session_file)
    
    last_pos = processed_positions.get(file_key, 0)
    
    # Handle file rotation
    try:
        current_size = session_file.stat().st_size
        if current_size < last_pos:
            last_pos = 0
    except:
        return 0
    
    turns_stored = 0
    
    try:
        with open(session_file, 'r') as f:
            f.seek(last_pos)
            
            for line in f:
                line = line.strip()
                if not line:
                    continue
                
                turn = parse_turn(line)
                if not turn:
                    continue
                
                # Generate unique ID
                turn_id = hashlib.sha256(
                    f"{session_id}:{turn['timestamp']}:{turn['content'][:50]}".encode()
                ).hexdigest()[:16]
                
                embedding = get_embedding(turn['content'])
                if not embedding:
                    continue
                
                if store_turn(db, turn_id, session_id, turn, embedding):
                    turns_stored += 1
                    icon = "👤" if turn['role'] == 'user' else "🤖"
                    print(f"[{datetime.now().strftime('%H:%M:%S')}] {icon} {turn['content'][:60]}...")
            
            processed_positions[file_key] = f.tell()
    
    except Exception as e:
        print(f"[ERROR] Processing {session_file.name}: {e}", file=sys.stderr)
    
    return turns_stored


def watch_loop():
    """Main watch loop"""
    print(f"[{datetime.now().strftime('%H:%M:%S')}] Starting session watcher...")
    print(f"  Sessions: {SESSIONS_DIR}")
    print(f"  Database: {SQLITE_PATH}")
    print(f"  Embeddings: {EMBEDDING_URL}")
    print()
    
    db = get_db_connection()
    current_session = None
    
    while running:
        try:
            latest = get_latest_session()
            
            if latest is None:
                time.sleep(1)
                continue
            
            if current_session != latest:
                print(f"[{datetime.now().strftime('%H:%M:%S')}] Watching: {latest.name}")
                current_session = latest
            
            process_session_file(db, latest)
            time.sleep(0.5)
        
        except KeyboardInterrupt:
            break
        except Exception as e:
            print(f"[ERROR] Watch loop: {e}", file=sys.stderr)
            time.sleep(1)
    
    db.close()
    print(f"[{datetime.now().strftime('%H:%M:%S')}] Watcher stopped.")


def main():
    signal.signal(signal.SIGINT, signal_handler)
    signal.signal(signal.SIGTERM, signal_handler)
    
    # Wait for embedding server with retries
    print("Connecting to embedding server...")
    attempt = 0
    while running:
        try:
            response = requests.get(f"{EMBEDDING_URL}/v1/models", timeout=5)
            response.raise_for_status()
            print("✅ Embedding server connected")
            break
        except Exception as e:
            attempt += 1
            if attempt % 6 == 1:
                print(f"⏳ Waiting for embedding server (attempt {attempt})...")
            time.sleep(10)
    
    if not running:
        sys.exit(0)
    
    # Verify database
    try:
        db = get_db_connection()
        db.execute("SELECT 1 FROM sessions_live LIMIT 1")
        db.execute("SELECT 1 FROM sessions_live_vec LIMIT 1")
        db.close()
        print("✅ Database ready")
    except Exception as e:
        print(f"❌ Database error: {e}")
        print("Have you created the sessions_live tables?")
        sys.exit(1)
    
    print()
    watch_loop()


if __name__ == "__main__":
    main()

Step 5: Create the Search Script

Save this as scripts/search_live.py:

#!/usr/bin/env python3
"""
Search Live Sessions — Query the sessions_live_vec table
"""

import sys
import json
import os
import sqlite3
import requests
from pathlib import Path

# Configuration
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://localhost:12344")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-nomic-embed-text-v1.5")
SQLITE_PATH = Path(os.getenv("SQLITE_PATH", Path.home() / ".openclaw/memory/main.sqlite"))


def get_embedding(text: str):
    """Get embedding from local model"""
    response = requests.post(
        f"{EMBEDDING_URL}/v1/embeddings",
        json={"input": text, "model": EMBEDDING_MODEL},
        timeout=30
    )
    response.raise_for_status()
    return response.json()["data"][0]["embedding"]


def search(query: str, limit: int = 5):
    """Search sessions_live for relevant turns"""
    import sqlite_vec
    
    embedding = get_embedding(query)
    
    db = sqlite3.connect(str(SQLITE_PATH))
    db.enable_load_extension(True)
    sqlite_vec.load(db)
    db.enable_load_extension(False)
    
    results = db.execute("""
        SELECT 
            s.role,
            s.content,
            s.timestamp,
            vec_distance_cosine(v.embedding, ?) as distance
        FROM sessions_live_vec v
        JOIN sessions_live s ON v.id = s.id
        ORDER BY distance ASC
        LIMIT ?
    """, (json.dumps(embedding), limit)).fetchall()
    
    db.close()
    return results


def main():
    if len(sys.argv) < 2:
        print("Usage: search_live.py <query> [limit]")
        print("Example: search_live.py 'what did we discuss about authentication'")
        sys.exit(1)
    
    query = sys.argv[1]
    limit = int(sys.argv[2]) if len(sys.argv) > 2 else 5
    
    print(f"Searching for: {query}\n")
    
    results = search(query, limit)
    
    for i, (role, content, timestamp, distance) in enumerate(results, 1):
        icon = "👤" if role == 'user' else "🤖"
        similarity = 1 - distance
        print(f"{i}. [{similarity:.2%}] {icon} {timestamp[:16]}")
        print(f"   {content[:200]}...")
        print()


if __name__ == "__main__":
    main()

Step 6: Run the Watcher

Quick start (manual):

python3 scripts/session_watcher.py

Background process:

nohup python3 scripts/session_watcher.py >> /tmp/session-watcher.log 2>&1 &

macOS launchd (recommended for always-on):

Create ~/Library/LaunchAgents/com.openclaw.session-watcher.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.openclaw.session-watcher</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/bin/python3</string>
        <string>/path/to/your/scripts/session_watcher.py</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/session-watcher.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/session-watcher.err</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>EMBEDDING_URL</key>
        <string>http://localhost:12344</string>
    </dict>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.openclaw.session-watcher.plist

Linux systemd:

Create /etc/systemd/user/session-watcher.service:

[Unit]
Description=OpenClaw Session Watcher
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/python3 /path/to/scripts/session_watcher.py
Restart=always
RestartSec=10
Environment=EMBEDDING_URL=http://localhost:12344

[Install]
WantedBy=default.target

Enable and start:

systemctl --user enable session-watcher
systemctl --user start session-watcher

Step 7: Search Your Sessions

python3 scripts/search_live.py "what did we discuss about authentication"

Output:

Searching for: what did we discuss about authentication

1. [87%] 🤖 2026-03-01 09:15
   We decided to use JWT with refresh tokens for authentication. The access token...

2. [82%] 👤 2026-03-01 09:10
   Let's implement authentication for the API. I'm thinking JWT but open to suggestions...

3. [76%] 🤖 2026-02-28 14:30
   For the auth system, we should consider httpOnly cookies to prevent XSS attacks...

Integrating with Your AI Agent

Add this to your agent instructions so it uses the search before answering:

## Memory Search (ENFORCE)

Before answering questions about prior work:
1. Search sessions_live first:
   ```bash
   python3 scripts/search_live.py "query" 5
  1. Then use built-in memory_search
  2. Don't guess. Don't say "I think we discussed..."

---

## Alternative Embedding Providers

If you don't want to run LM Studio or Ollama locally, here are alternatives:

### OpenAI API

Modify the scripts to use OpenAI:

```python
import openai

def get_embedding(text: str):
    response = openai.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

Remember to change the vector dimension to 1536.

Cost: ~$0.00002 per 1K tokens (very cheap, but not free)

Hugging Face Inference API

import requests

def get_embedding(text: str):
    response = requests.post(
        "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2",
        headers={"Authorization": f"Bearer {HF_TOKEN}"},
        json={"inputs": text}
    )
    return response.json()

Dimension: 384 (update your SQL accordingly)

Self-Hosted sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def get_embedding(text: str):
    return model.encode(text).tolist()

This runs entirely locally without needing a server process.


Other Solutions Worth Knowing

If you want a more turnkey solution, these plugins might help:

Claude-Mem Plugin

Auto-captures tool usage and decisions, compresses them, and injects relevant context. Claims 95% token savings. Works specifically with Claude Code.

Mem0 Plugin

Moves memory control out of the agent loop into the system layer. Auto-captures without the AI deciding what's important.

Cognee Plugin

Builds a knowledge graph of entities and relationships. Better at reasoning about connections between facts.


Quick Reference

Files to Create/Modify

FilePurpose
memory/CONTEXT.mdWorking memory (current state)
memory/YYYY-MM-DD.mdDaily logs
MEMORY.mdLong-term knowledge
AGENTS.mdAgent instructions with memory rules
scripts/session_watcher.pyReal-time capture daemon
scripts/search_live.pySemantic search helper

Quick Commands

# Start watcher
nohup python3 scripts/session_watcher.py >> /tmp/session-watcher.log 2>&1 &

# Check if running
ps aux | grep session_watcher | grep -v grep

# Search memories
python3 scripts/search_live.py "your query"

# View logs
tail -f /tmp/session-watcher.log

# Stop watcher
pkill -f session_watcher.py

The Takeaway

AI agents don't have bad memory—they have optional memory. The model is smart enough to remember; the system just doesn't enforce it.

The fixes are straightforward:

  1. Structure your memory files (CONTEXT.md, daily notes, MEMORY.md)
  2. Enforce write discipline (write immediately, not later)
  3. Require memory search before answering
  4. Capture everything automatically (True-Recall daemon)

After implementing these changes, our agent went from "Wait, what project are we working on?" to waking up each session knowing exactly what's in progress, what decisions were made yesterday, and what's blocked.

It's the difference between Gonzo forgetting where he put his chickens and actually having a working filing system. The chickens are still chaotic, but at least we know where they are.


A Note on Timing

Full disclosure: we're only about 24 hours into running this new configuration. The session watcher has captured 200+ turns, the behavioral changes are in place, and the agent is waking up with context instead of amnesia.

Is it perfect? Too early to say. But the early signs are promising—no more "what project are we working on?" moments, and the semantic search is pulling relevant context from yesterday's conversations.

We'll update this post as we learn more. For now, consider this a "works on our machine" with cautious optimism.


Questions? Improvements? Found a bug? Drop a comment below.

Teaching Your AI Agent to Actually Remember Things · Matt Rowe