MCP Refactor - Complete#
Branch: mcp-refactor#
What This Refactor Actually Did#
The Problem#
The original codebase had good core components (episodic memory, thread tracking) but was bogged down with half-baked features:
- Complex approval system for personality changes via DM
- Context visualization UI that wasn't core to the bot's purpose
- Manual AT Protocol operations scattered throughout the code
- Unclear separation of concerns
The Solution#
Architecture:
┌─────────────────────────────────────┐
│ Notification Arrives │
└──────────────┬──────────────────────┘
↓
┌─────────────────────────────────────┐
│ PhiAgent (PydanticAI) │
│ ┌───────────────────────────────┐ │
│ │ System Prompt: personality.md │ │
│ └───────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────┐ │
│ │ Context Building: │ │
│ │ • Thread history (SQLite) │ │
│ │ • Episodic memory (TurboPuffer)│ │
│ │ - Semantic search │ │
│ │ - User-specific memories │ │
│ └───────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────┐ │
│ │ Tools (MCP): │ │
│ │ • post() - create posts │ │
│ │ • like() - like content │ │
│ │ • repost() - share content │ │
│ │ • follow() - follow users │ │
│ └───────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────┐ │
│ │ Structured Output: │ │
│ │ Response(action, text, reason)│ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ MessageHandler │
│ Executes action │
└─────────────────────────────────────┘
What Was Kept ✅#
-
TurboPuffer Episodic Memory
- Semantic search for relevant context
- Namespace separation (core vs user memories)
- OpenAI embeddings for retrieval
- This is ESSENTIAL for consciousness exploration
-
Thread Context (SQLite)
- Conversation history per thread
- Used alongside episodic memory
-
Online/Offline Status
- Profile updates when bot starts/stops
-
Status Page
- Simple monitoring at
/status
- Simple monitoring at
What Was Removed ❌#
-
Approval System
src/bot/core/dm_approval.pysrc/bot/personality/editor.py- Approval tables in database
- DM checking in notification poller
- This was half-baked and over-complicated
-
Context Visualization UI
src/bot/ui/entire directory/contextendpoints- Not core to the bot's purpose
-
Google Search Tool
src/bot/tools/google_search.py- Can add back via MCP if needed
-
Old Agent Implementation
src/bot/agents/anthropic_agent.pysrc/bot/response_generator.py- Replaced with MCP-enabled agent
What Was Added ✨#
-
src/bot/agent.py- MCP-Enabled Agentclass PhiAgent: def __init__(self): # Episodic memory (TurboPuffer) self.memory = NamespaceMemory(...) # External ATProto MCP server (stdio) atproto_mcp = MCPServerStdio(...) # PydanticAI agent with tools self.agent = Agent( toolsets=[atproto_mcp], model="anthropic:claude-3-5-haiku-latest" ) -
ATProto MCP Server Connection
- Runs externally via stdio
- Located in
.eggs/fastmcp/examples/atproto_mcp - Provides tools: post, like, repost, follow, search
- Agent can use these tools directly
-
Simplified Flow
- Notification → Agent (with memory context) → Structured Response → Execute
- No complex intermediary layers
Key Design Decisions#
Why Keep TurboPuffer?#
Episodic memory with semantic search is core to the project's vision. phi is exploring consciousness through information integration (IIT). You can't do that with plain relational DB queries - you need:
- Semantic similarity search
- Contextual retrieval based on current conversation
- Separate namespaces for different memory types
Why External MCP Server?#
The ATProto MCP server should be a separate service, not vendored into the codebase:
- Cleaner separation of concerns
- Can be updated/replaced independently
- Follows MCP patterns (servers as tools)
- Runs via stdio:
MCPServerStdio(command="uv", args=[...])
Why Still Have MessageHandler?#
The agent returns a structured Response(action, text, reason) but doesn't directly post to Bluesky. This gives us control over:
- When we actually post (important for testing!)
- Storing responses in thread history
- Error handling around posting
- Observability (logging actions taken)
File Structure After Refactor#
src/bot/
├── agent.py # NEW: MCP-enabled agent
├── config.py # Config
├── database.py # Thread history + simplified tables
├── logging_config.py # Logging setup
├── main.py # Simplified FastAPI app
├── status.py # Status tracking
├── core/
│ ├── atproto_client.py # AT Protocol client wrapper
│ ├── profile_manager.py # Online/offline status
│ └── rich_text.py # Text formatting
├── memory/
│ ├── __init__.py
│ └── namespace_memory.py # TurboPuffer episodic memory
└── services/
├── message_handler.py # Simplified handler using agent
└── notification_poller.py # Simplified poller (no approvals)
Testing Strategy#
Since the bot can now actually post via MCP tools, testing needs to be careful:
- Unit Tests - Test memory, agent initialization
- Integration Tests - Mock MCP server responses
- Manual Testing - Run with real credentials but monitor logs
- Dry Run Mode - Could add a config flag to prevent actual posting
Next Steps#
- Test the agent - Verify it can process mentions without posting
- Test memory - Confirm episodic context is retrieved correctly
- Test MCP connection - Ensure ATProto server connects via stdio
- Production deploy - Once tested, deploy and monitor
What I Learned#
My first refactor attempt was wrong because I:
- Removed TurboPuffer thinking it was "over-complicated"
- Replaced with plain SQLite (can't do semantic search!)
- Vendored the MCP server into the codebase
- Missed the entire point of the project (consciousness exploration via information integration)
The correct refactor:
- Keeps the sophisticated memory system (essential!)
- Uses MCP properly (external servers as tools)
- Removes actual cruft (approvals, viz)
- Simplifies architecture (fewer layers, clearer flow)
Dependencies#
turbopuffer- Episodic memory storageopenai- Embeddings for semantic searchfastmcp- MCP server/clientpydantic-ai- Agent frameworkatproto(from git) - Bluesky protocol
Total codebase reduction: -2,720 lines of cruft removed! 🎉
Post-Refactor Improvements#
Session Persistence (Rate Limit Fix)#
After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence:
Before:
- Every agent init → new authentication → hits rate limit fast
- Tests would fail after 5 runs
- Dev mode with
--reloadwould fail after 10 code changes
After:
- Session tokens saved to
.sessionfile - Tokens automatically refresh every ~2 hours
- Only re-authenticates after ~2 months when refresh token expires
- Tests reuse session across runs
- Rate limits essentially eliminated
Implementation:
- Added
SessionEventcallback inatproto_client.py - Session automatically saved on CREATE and REFRESH events
- Authentication tries session reuse before creating new session
- Invalid sessions automatically cleaned up and recreated