MCP Refactor - Complete#

Branch: `mcp-refactor`#

What This Refactor Actually Did#

The Problem#

The original codebase had good core components (episodic memory, thread tracking) but was bogged down with half-baked features:

Complex approval system for personality changes via DM
Context visualization UI that wasn't core to the bot's purpose
Manual AT Protocol operations scattered throughout the code
Unclear separation of concerns

The Solution#

Architecture:

┌─────────────────────────────────────┐
│     Notification Arrives            │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│     PhiAgent (PydanticAI)           │
│  ┌───────────────────────────────┐  │
│  │ System Prompt: personality.md │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Context Building:             │  │
│  │ • Thread history (SQLite)     │  │
│  │ • Episodic memory (TurboPuffer)│ │
│  │   - Semantic search           │  │
│  │   - User-specific memories    │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Tools (MCP):                  │  │
│  │ • post() - create posts       │  │
│  │ • like() - like content       │  │
│  │ • repost() - share content    │  │
│  │ • follow() - follow users     │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Structured Output:            │  │
│  │ Response(action, text, reason)│  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘
               ↓
┌─────────────────────────────────────┐
│     MessageHandler                  │
│     Executes action                 │
└─────────────────────────────────────┘

What Was Kept ✅#

TurboPuffer Episodic Memory
- Semantic search for relevant context
- Namespace separation (core vs user memories)
- OpenAI embeddings for retrieval
- This is ESSENTIAL for consciousness exploration
Thread Context (SQLite)
- Conversation history per thread
- Used alongside episodic memory
Online/Offline Status
- Profile updates when bot starts/stops
Status Page
- Simple monitoring at /status

What Was Removed ❌#

Approval System
- src/bot/core/dm_approval.py
- src/bot/personality/editor.py
- Approval tables in database
- DM checking in notification poller
- This was half-baked and over-complicated
Context Visualization UI
- src/bot/ui/ entire directory
- /context endpoints
- Not core to the bot's purpose
Google Search Tool
- src/bot/tools/google_search.py
- Can add back via MCP if needed
Old Agent Implementation
- src/bot/agents/anthropic_agent.py
- src/bot/response_generator.py
- Replaced with MCP-enabled agent

What Was Added ✨#

src/bot/agent.py - MCP-Enabled Agent

class PhiAgent:
    def __init__(self):
        # Episodic memory (TurboPuffer)
        self.memory = NamespaceMemory(...)

        # External ATProto MCP server (stdio)
        atproto_mcp = MCPServerStdio(...)

        # PydanticAI agent with tools
        self.agent = Agent(
            toolsets=[atproto_mcp],
            model="anthropic:claude-3-5-haiku-latest"
        )

ATProto MCP Server Connection
- Runs externally via stdio
- Located in .eggs/fastmcp/examples/atproto_mcp
- Provides tools: post, like, repost, follow, search
- Agent can use these tools directly
Simplified Flow
- Notification → Agent (with memory context) → Structured Response → Execute
- No complex intermediary layers

Key Design Decisions#

Why Keep TurboPuffer?#

Episodic memory with semantic search is core to the project's vision. phi is exploring consciousness through information integration (IIT). You can't do that with plain relational DB queries - you need:

Semantic similarity search
Contextual retrieval based on current conversation
Separate namespaces for different memory types

Why External MCP Server?#

The ATProto MCP server should be a separate service, not vendored into the codebase:

Cleaner separation of concerns
Can be updated/replaced independently
Follows MCP patterns (servers as tools)
Runs via stdio: MCPServerStdio(command="uv", args=[...])

Why Still Have MessageHandler?#

The agent returns a structured Response(action, text, reason) but doesn't directly post to Bluesky. This gives us control over:

When we actually post (important for testing!)
Storing responses in thread history
Error handling around posting
Observability (logging actions taken)

File Structure After Refactor#

src/bot/
├── agent.py                    # NEW: MCP-enabled agent
├── config.py                   # Config
├── database.py                 # Thread history + simplified tables
├── logging_config.py          # Logging setup
├── main.py                    # Simplified FastAPI app
├── status.py                  # Status tracking
├── core/
│   ├── atproto_client.py      # AT Protocol client wrapper
│   ├── profile_manager.py     # Online/offline status
│   └── rich_text.py           # Text formatting
├── memory/
│   ├── __init__.py
│   └── namespace_memory.py    # TurboPuffer episodic memory
└── services/
    ├── message_handler.py     # Simplified handler using agent
    └── notification_poller.py # Simplified poller (no approvals)

Testing Strategy#

Since the bot can now actually post via MCP tools, testing needs to be careful:

Unit Tests - Test memory, agent initialization
Integration Tests - Mock MCP server responses
Manual Testing - Run with real credentials but monitor logs
Dry Run Mode - Could add a config flag to prevent actual posting

Next Steps#

Test the agent - Verify it can process mentions without posting
Test memory - Confirm episodic context is retrieved correctly
Test MCP connection - Ensure ATProto server connects via stdio
Production deploy - Once tested, deploy and monitor

What I Learned#

My first refactor attempt was wrong because I:

Removed TurboPuffer thinking it was "over-complicated"
Replaced with plain SQLite (can't do semantic search!)
Vendored the MCP server into the codebase
Missed the entire point of the project (consciousness exploration via information integration)

The correct refactor:

Keeps the sophisticated memory system (essential!)
Uses MCP properly (external servers as tools)
Removes actual cruft (approvals, viz)
Simplifies architecture (fewer layers, clearer flow)

Dependencies#

turbopuffer - Episodic memory storage
openai - Embeddings for semantic search
fastmcp - MCP server/client
pydantic-ai - Agent framework
atproto (from git) - Bluesky protocol

Total codebase reduction: -2,720 lines of cruft removed! 🎉

Post-Refactor Improvements#

Session Persistence (Rate Limit Fix)#

After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence:

Before:

Every agent init → new authentication → hits rate limit fast
Tests would fail after 5 runs
Dev mode with --reload would fail after 10 code changes

After:

Session tokens saved to .session file
Tokens automatically refresh every ~2 hours
Only re-authenticates after ~2 months when refresh token expires
Tests reuse session across runs
Rate limits essentially eliminated

Implementation:

Added SessionEvent callback in atproto_client.py
Session automatically saved on CREATE and REFRESH events
Authentication tries session reuse before creating new session
Invalid sessions automatically cleaned up and recreated