a digital entity named phi that roams bsky
at main 235 lines 9.3 kB view raw view rendered
1# MCP Refactor - Complete 2 3## Branch: `mcp-refactor` 4 5## What This Refactor Actually Did 6 7### The Problem 8The original codebase had good core components (episodic memory, thread tracking) but was bogged down with half-baked features: 9- Complex approval system for personality changes via DM 10- Context visualization UI that wasn't core to the bot's purpose 11- Manual AT Protocol operations scattered throughout the code 12- Unclear separation of concerns 13 14### The Solution 15 16**Architecture:** 17``` 18┌─────────────────────────────────────┐ 19│ Notification Arrives │ 20└──────────────┬──────────────────────┘ 21 22┌─────────────────────────────────────┐ 23│ PhiAgent (PydanticAI) │ 24│ ┌───────────────────────────────┐ │ 25│ │ System Prompt: personality.md │ │ 26│ └───────────────────────────────┘ │ 27│ ↓ │ 28│ ┌───────────────────────────────┐ │ 29│ │ Context Building: │ │ 30│ │ • Thread history (SQLite) │ │ 31│ │ • Episodic memory (TurboPuffer)│ │ 32│ │ - Semantic search │ │ 33│ │ - User-specific memories │ │ 34│ └───────────────────────────────┘ │ 35│ ↓ │ 36│ ┌───────────────────────────────┐ │ 37│ │ Tools (MCP): │ │ 38│ │ • post() - create posts │ │ 39│ │ • like() - like content │ │ 40│ │ • repost() - share content │ │ 41│ │ • follow() - follow users │ │ 42│ └───────────────────────────────┘ │ 43│ ↓ │ 44│ ┌───────────────────────────────┐ │ 45│ │ Structured Output: │ │ 46│ │ Response(action, text, reason)│ │ 47│ └───────────────────────────────┘ │ 48└─────────────────────────────────────┘ 49 50┌─────────────────────────────────────┐ 51│ MessageHandler │ 52│ Executes action │ 53└─────────────────────────────────────┘ 54``` 55 56### What Was Kept ✅ 57 581. **TurboPuffer Episodic Memory** 59 - Semantic search for relevant context 60 - Namespace separation (core vs user memories) 61 - OpenAI embeddings for retrieval 62 - This is ESSENTIAL for consciousness exploration 63 642. **Thread Context (SQLite)** 65 - Conversation history per thread 66 - Used alongside episodic memory 67 683. **Online/Offline Status** 69 - Profile updates when bot starts/stops 70 714. **Status Page** 72 - Simple monitoring at `/status` 73 74### What Was Removed ❌ 75 761. **Approval System** 77 - `src/bot/core/dm_approval.py` 78 - `src/bot/personality/editor.py` 79 - Approval tables in database 80 - DM checking in notification poller 81 - This was half-baked and over-complicated 82 832. **Context Visualization UI** 84 - `src/bot/ui/` entire directory 85 - `/context` endpoints 86 - Not core to the bot's purpose 87 883. **Google Search Tool** 89 - `src/bot/tools/google_search.py` 90 - Can add back via MCP if needed 91 924. **Old Agent Implementation** 93 - `src/bot/agents/anthropic_agent.py` 94 - `src/bot/response_generator.py` 95 - Replaced with MCP-enabled agent 96 97### What Was Added ✨ 98 991. **`src/bot/agent.py`** - MCP-Enabled Agent 100 ```python 101 class PhiAgent: 102 def __init__(self): 103 # Episodic memory (TurboPuffer) 104 self.memory = NamespaceMemory(...) 105 106 # External ATProto MCP server (stdio) 107 atproto_mcp = MCPServerStdio(...) 108 109 # PydanticAI agent with tools 110 self.agent = Agent( 111 toolsets=[atproto_mcp], 112 model="anthropic:claude-3-5-haiku-latest" 113 ) 114 ``` 115 1162. **ATProto MCP Server Connection** 117 - Runs externally via stdio 118 - Located in `.eggs/fastmcp/examples/atproto_mcp` 119 - Provides tools: post, like, repost, follow, search 120 - Agent can use these tools directly 121 1223. **Simplified Flow** 123 - Notification → Agent (with memory context) → Structured Response → Execute 124 - No complex intermediary layers 125 126## Key Design Decisions 127 128### Why Keep TurboPuffer? 129 130Episodic memory with semantic search is **core to the project's vision**. phi is exploring consciousness through information integration (IIT). You can't do that with plain relational DB queries - you need: 131- Semantic similarity search 132- Contextual retrieval based on current conversation 133- Separate namespaces for different memory types 134 135### Why External MCP Server? 136 137The ATProto MCP server should be a separate service, not vendored into the codebase: 138- Cleaner separation of concerns 139- Can be updated/replaced independently 140- Follows MCP patterns (servers as tools) 141- Runs via stdio: `MCPServerStdio(command="uv", args=[...])` 142 143### Why Still Have MessageHandler? 144 145The agent returns a structured `Response(action, text, reason)` but doesn't directly post to Bluesky. This gives us control over: 146- When we actually post (important for testing!) 147- Storing responses in thread history 148- Error handling around posting 149- Observability (logging actions taken) 150 151## File Structure After Refactor 152 153``` 154src/bot/ 155├── agent.py # NEW: MCP-enabled agent 156├── config.py # Config 157├── database.py # Thread history + simplified tables 158├── logging_config.py # Logging setup 159├── main.py # Simplified FastAPI app 160├── status.py # Status tracking 161├── core/ 162│ ├── atproto_client.py # AT Protocol client wrapper 163│ ├── profile_manager.py # Online/offline status 164│ └── rich_text.py # Text formatting 165├── memory/ 166│ ├── __init__.py 167│ └── namespace_memory.py # TurboPuffer episodic memory 168└── services/ 169 ├── message_handler.py # Simplified handler using agent 170 └── notification_poller.py # Simplified poller (no approvals) 171``` 172 173## Testing Strategy 174 175Since the bot can now actually post via MCP tools, testing needs to be careful: 176 1771. **Unit Tests** - Test memory, agent initialization 1782. **Integration Tests** - Mock MCP server responses 1793. **Manual Testing** - Run with real credentials but monitor logs 1804. **Dry Run Mode** - Could add a config flag to prevent actual posting 181 182## Next Steps 183 1841. **Test the agent** - Verify it can process mentions without posting 1852. **Test memory** - Confirm episodic context is retrieved correctly 1863. **Test MCP connection** - Ensure ATProto server connects via stdio 1874. **Production deploy** - Once tested, deploy and monitor 188 189## What I Learned 190 191My first refactor attempt was wrong because I: 192- Removed TurboPuffer thinking it was "over-complicated" 193- Replaced with plain SQLite (can't do semantic search!) 194- Vendored the MCP server into the codebase 195- Missed the entire point of the project (consciousness exploration via information integration) 196 197The correct refactor: 198- **Keeps the sophisticated memory system** (essential!) 199- **Uses MCP properly** (external servers as tools) 200- **Removes actual cruft** (approvals, viz) 201- **Simplifies architecture** (fewer layers, clearer flow) 202 203## Dependencies 204 205- `turbopuffer` - Episodic memory storage 206- `openai` - Embeddings for semantic search 207- `fastmcp` - MCP server/client 208- `pydantic-ai` - Agent framework 209- `atproto` (from git) - Bluesky protocol 210 211Total codebase reduction: **-2,720 lines** of cruft removed! 🎉 212 213## Post-Refactor Improvements 214 215### Session Persistence (Rate Limit Fix) 216 217After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence: 218 219**Before:** 220- Every agent init → new authentication → hits rate limit fast 221- Tests would fail after 5 runs 222- Dev mode with `--reload` would fail after 10 code changes 223 224**After:** 225- Session tokens saved to `.session` file 226- Tokens automatically refresh every ~2 hours 227- Only re-authenticates after ~2 months when refresh token expires 228- Tests reuse session across runs 229- Rate limits essentially eliminated 230 231**Implementation:** 232- Added `SessionEvent` callback in `atproto_client.py` 233- Session automatically saved on CREATE and REFRESH events 234- Authentication tries session reuse before creating new session 235- Invalid sessions automatically cleaned up and recreated