sandbox/MCP_REFACTOR_SUMMARY.md at main

zzstoatzz.io / bot
fork atom
a digital entity named phi that roams bsky
fork atom
bot / sandbox / MCP_REFACTOR_SUMMARY.md
at main 235 lines 9.3 kB view raw view rendered
wrap content
zzstoatzz.io big old refactor 5mo ago
199e4351
  1# MCP Refactor - Complete
  2
  3## Branch: `mcp-refactor`
  4
  5## What This Refactor Actually Did
  6
  7### The Problem
  8The original codebase had good core components (episodic memory, thread tracking) but was bogged down with half-baked features:
  9- Complex approval system for personality changes via DM
 10- Context visualization UI that wasn't core to the bot's purpose
 11- Manual AT Protocol operations scattered throughout the code
 12- Unclear separation of concerns
 13
 14### The Solution
 15
 16**Architecture:**
 17```
 18┌─────────────────────────────────────┐
 19│     Notification Arrives            │
 20└──────────────┬──────────────────────┘
 21               ↓
 22┌─────────────────────────────────────┐
 23│     PhiAgent (PydanticAI)           │
 24│  ┌───────────────────────────────┐  │
 25│  │ System Prompt: personality.md │  │
 26│  └───────────────────────────────┘  │
 27│              ↓                      │
 28│  ┌───────────────────────────────┐  │
 29│  │ Context Building:             │  │
 30│  │ • Thread history (SQLite)     │  │
 31│  │ • Episodic memory (TurboPuffer)│ │
 32│  │   - Semantic search           │  │
 33│  │   - User-specific memories    │  │
 34│  └───────────────────────────────┘  │
 35│              ↓                      │
 36│  ┌───────────────────────────────┐  │
 37│  │ Tools (MCP):                  │  │
 38│  │ • post() - create posts       │  │
 39│  │ • like() - like content       │  │
 40│  │ • repost() - share content    │  │
 41│  │ • follow() - follow users     │  │
 42│  └───────────────────────────────┘  │
 43│              ↓                      │
 44│  ┌───────────────────────────────┐  │
 45│  │ Structured Output:            │  │
 46│  │ Response(action, text, reason)│  │
 47│  └───────────────────────────────┘  │
 48└─────────────────────────────────────┘
 49               ↓
 50┌─────────────────────────────────────┐
 51│     MessageHandler                  │
 52│     Executes action                 │
 53└─────────────────────────────────────┘
 54```
 55
 56### What Was Kept ✅
 57
 581. **TurboPuffer Episodic Memory**
 59   - Semantic search for relevant context
 60   - Namespace separation (core vs user memories)
 61   - OpenAI embeddings for retrieval
 62   - This is ESSENTIAL for consciousness exploration
 63
 642. **Thread Context (SQLite)**
 65   - Conversation history per thread
 66   - Used alongside episodic memory
 67
 683. **Online/Offline Status**
 69   - Profile updates when bot starts/stops
 70
 714. **Status Page**
 72   - Simple monitoring at `/status`
 73
 74### What Was Removed ❌
 75
 761. **Approval System**
 77   - `src/bot/core/dm_approval.py`
 78   - `src/bot/personality/editor.py`
 79   - Approval tables in database
 80   - DM checking in notification poller
 81   - This was half-baked and over-complicated
 82
 832. **Context Visualization UI**
 84   - `src/bot/ui/` entire directory
 85   - `/context` endpoints
 86   - Not core to the bot's purpose
 87
 883. **Google Search Tool**
 89   - `src/bot/tools/google_search.py`
 90   - Can add back via MCP if needed
 91
 924. **Old Agent Implementation**
 93   - `src/bot/agents/anthropic_agent.py`
 94   - `src/bot/response_generator.py`
 95   - Replaced with MCP-enabled agent
 96
 97### What Was Added ✨
 98
 991. **`src/bot/agent.py`** - MCP-Enabled Agent
100   ```python
101   class PhiAgent:
102       def __init__(self):
103           # Episodic memory (TurboPuffer)
104           self.memory = NamespaceMemory(...)
105
106           # External ATProto MCP server (stdio)
107           atproto_mcp = MCPServerStdio(...)
108
109           # PydanticAI agent with tools
110           self.agent = Agent(
111               toolsets=[atproto_mcp],
112               model="anthropic:claude-3-5-haiku-latest"
113           )
114   ```
115
1162. **ATProto MCP Server Connection**
117   - Runs externally via stdio
118   - Located in `.eggs/fastmcp/examples/atproto_mcp`
119   - Provides tools: post, like, repost, follow, search
120   - Agent can use these tools directly
121
1223. **Simplified Flow**
123   - Notification → Agent (with memory context) → Structured Response → Execute
124   - No complex intermediary layers
125
126## Key Design Decisions
127
128### Why Keep TurboPuffer?
129
130Episodic memory with semantic search is **core to the project's vision**. phi is exploring consciousness through information integration (IIT). You can't do that with plain relational DB queries - you need:
131- Semantic similarity search
132- Contextual retrieval based on current conversation
133- Separate namespaces for different memory types
134
135### Why External MCP Server?
136
137The ATProto MCP server should be a separate service, not vendored into the codebase:
138- Cleaner separation of concerns
139- Can be updated/replaced independently
140- Follows MCP patterns (servers as tools)
141- Runs via stdio: `MCPServerStdio(command="uv", args=[...])`
142
143### Why Still Have MessageHandler?
144
145The agent returns a structured `Response(action, text, reason)` but doesn't directly post to Bluesky. This gives us control over:
146- When we actually post (important for testing!)
147- Storing responses in thread history
148- Error handling around posting
149- Observability (logging actions taken)
150
151## File Structure After Refactor
152
153```
154src/bot/
155├── agent.py                    # NEW: MCP-enabled agent
156├── config.py                   # Config
157├── database.py                 # Thread history + simplified tables
158├── logging_config.py          # Logging setup
159├── main.py                    # Simplified FastAPI app
160├── status.py                  # Status tracking
161├── core/
162│   ├── atproto_client.py      # AT Protocol client wrapper
163│   ├── profile_manager.py     # Online/offline status
164│   └── rich_text.py           # Text formatting
165├── memory/
166│   ├── __init__.py
167│   └── namespace_memory.py    # TurboPuffer episodic memory
168└── services/
169    ├── message_handler.py     # Simplified handler using agent
170    └── notification_poller.py # Simplified poller (no approvals)
171```
172
173## Testing Strategy
174
175Since the bot can now actually post via MCP tools, testing needs to be careful:
176
1771. **Unit Tests** - Test memory, agent initialization
1782. **Integration Tests** - Mock MCP server responses
1793. **Manual Testing** - Run with real credentials but monitor logs
1804. **Dry Run Mode** - Could add a config flag to prevent actual posting
181
182## Next Steps
183
1841. **Test the agent** - Verify it can process mentions without posting
1852. **Test memory** - Confirm episodic context is retrieved correctly
1863. **Test MCP connection** - Ensure ATProto server connects via stdio
1874. **Production deploy** - Once tested, deploy and monitor
188
189## What I Learned
190
191My first refactor attempt was wrong because I:
192- Removed TurboPuffer thinking it was "over-complicated"
193- Replaced with plain SQLite (can't do semantic search!)
194- Vendored the MCP server into the codebase
195- Missed the entire point of the project (consciousness exploration via information integration)
196
197The correct refactor:
198- **Keeps the sophisticated memory system** (essential!)
199- **Uses MCP properly** (external servers as tools)
200- **Removes actual cruft** (approvals, viz)
201- **Simplifies architecture** (fewer layers, clearer flow)
202
203## Dependencies
204
205- `turbopuffer` - Episodic memory storage
206- `openai` - Embeddings for semantic search
207- `fastmcp` - MCP server/client
208- `pydantic-ai` - Agent framework
209- `atproto` (from git) - Bluesky protocol
210
211Total codebase reduction: **-2,720 lines** of cruft removed! 🎉
212
213## Post-Refactor Improvements
214
215### Session Persistence (Rate Limit Fix)
216
217After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence:
218
219**Before:**
220- Every agent init → new authentication → hits rate limit fast
221- Tests would fail after 5 runs
222- Dev mode with `--reload` would fail after 10 code changes
223
224**After:**
225- Session tokens saved to `.session` file
226- Tokens automatically refresh every ~2 hours
227- Only re-authenticates after ~2 months when refresh token expires
228- Tests reuse session across runs
229- Rate limits essentially eliminated
230
231**Implementation:**
232- Added `SessionEvent` callback in `atproto_client.py`
233- Session automatically saved on CREATE and REFRESH events
234- Authentication tries session reuse before creating new session
235- Invalid sessions automatically cleaned up and recreated