a digital entity named phi that roams bsky

MCP Refactor - Complete#

Branch: mcp-refactor#

What This Refactor Actually Did#

The Problem#

The original codebase had good core components (episodic memory, thread tracking) but was bogged down with half-baked features:

  • Complex approval system for personality changes via DM
  • Context visualization UI that wasn't core to the bot's purpose
  • Manual AT Protocol operations scattered throughout the code
  • Unclear separation of concerns

The Solution#

Architecture:

┌─────────────────────────────────────┐
│     Notification Arrives            │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│     PhiAgent (PydanticAI)           │
│  ┌───────────────────────────────┐  │
│  │ System Prompt: personality.md │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Context Building:             │  │
│  │ • Thread history (SQLite)     │  │
│  │ • Episodic memory (TurboPuffer)│ │
│  │   - Semantic search           │  │
│  │   - User-specific memories    │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Tools (MCP):                  │  │
│  │ • post() - create posts       │  │
│  │ • like() - like content       │  │
│  │ • repost() - share content    │  │
│  │ • follow() - follow users     │  │
│  └───────────────────────────────┘  │
│              ↓                      │
│  ┌───────────────────────────────┐  │
│  │ Structured Output:            │  │
│  │ Response(action, text, reason)│  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘
               ↓
┌─────────────────────────────────────┐
│     MessageHandler                  │
│     Executes action                 │
└─────────────────────────────────────┘

What Was Kept ✅#

  1. TurboPuffer Episodic Memory

    • Semantic search for relevant context
    • Namespace separation (core vs user memories)
    • OpenAI embeddings for retrieval
    • This is ESSENTIAL for consciousness exploration
  2. Thread Context (SQLite)

    • Conversation history per thread
    • Used alongside episodic memory
  3. Online/Offline Status

    • Profile updates when bot starts/stops
  4. Status Page

    • Simple monitoring at /status

What Was Removed ❌#

  1. Approval System

    • src/bot/core/dm_approval.py
    • src/bot/personality/editor.py
    • Approval tables in database
    • DM checking in notification poller
    • This was half-baked and over-complicated
  2. Context Visualization UI

    • src/bot/ui/ entire directory
    • /context endpoints
    • Not core to the bot's purpose
  3. Google Search Tool

    • src/bot/tools/google_search.py
    • Can add back via MCP if needed
  4. Old Agent Implementation

    • src/bot/agents/anthropic_agent.py
    • src/bot/response_generator.py
    • Replaced with MCP-enabled agent

What Was Added ✨#

  1. src/bot/agent.py - MCP-Enabled Agent

    class PhiAgent:
        def __init__(self):
            # Episodic memory (TurboPuffer)
            self.memory = NamespaceMemory(...)
    
            # External ATProto MCP server (stdio)
            atproto_mcp = MCPServerStdio(...)
    
            # PydanticAI agent with tools
            self.agent = Agent(
                toolsets=[atproto_mcp],
                model="anthropic:claude-3-5-haiku-latest"
            )
    
  2. ATProto MCP Server Connection

    • Runs externally via stdio
    • Located in .eggs/fastmcp/examples/atproto_mcp
    • Provides tools: post, like, repost, follow, search
    • Agent can use these tools directly
  3. Simplified Flow

    • Notification → Agent (with memory context) → Structured Response → Execute
    • No complex intermediary layers

Key Design Decisions#

Why Keep TurboPuffer?#

Episodic memory with semantic search is core to the project's vision. phi is exploring consciousness through information integration (IIT). You can't do that with plain relational DB queries - you need:

  • Semantic similarity search
  • Contextual retrieval based on current conversation
  • Separate namespaces for different memory types

Why External MCP Server?#

The ATProto MCP server should be a separate service, not vendored into the codebase:

  • Cleaner separation of concerns
  • Can be updated/replaced independently
  • Follows MCP patterns (servers as tools)
  • Runs via stdio: MCPServerStdio(command="uv", args=[...])

Why Still Have MessageHandler?#

The agent returns a structured Response(action, text, reason) but doesn't directly post to Bluesky. This gives us control over:

  • When we actually post (important for testing!)
  • Storing responses in thread history
  • Error handling around posting
  • Observability (logging actions taken)

File Structure After Refactor#

src/bot/
├── agent.py                    # NEW: MCP-enabled agent
├── config.py                   # Config
├── database.py                 # Thread history + simplified tables
├── logging_config.py          # Logging setup
├── main.py                    # Simplified FastAPI app
├── status.py                  # Status tracking
├── core/
│   ├── atproto_client.py      # AT Protocol client wrapper
│   ├── profile_manager.py     # Online/offline status
│   └── rich_text.py           # Text formatting
├── memory/
│   ├── __init__.py
│   └── namespace_memory.py    # TurboPuffer episodic memory
└── services/
    ├── message_handler.py     # Simplified handler using agent
    └── notification_poller.py # Simplified poller (no approvals)

Testing Strategy#

Since the bot can now actually post via MCP tools, testing needs to be careful:

  1. Unit Tests - Test memory, agent initialization
  2. Integration Tests - Mock MCP server responses
  3. Manual Testing - Run with real credentials but monitor logs
  4. Dry Run Mode - Could add a config flag to prevent actual posting

Next Steps#

  1. Test the agent - Verify it can process mentions without posting
  2. Test memory - Confirm episodic context is retrieved correctly
  3. Test MCP connection - Ensure ATProto server connects via stdio
  4. Production deploy - Once tested, deploy and monitor

What I Learned#

My first refactor attempt was wrong because I:

  • Removed TurboPuffer thinking it was "over-complicated"
  • Replaced with plain SQLite (can't do semantic search!)
  • Vendored the MCP server into the codebase
  • Missed the entire point of the project (consciousness exploration via information integration)

The correct refactor:

  • Keeps the sophisticated memory system (essential!)
  • Uses MCP properly (external servers as tools)
  • Removes actual cruft (approvals, viz)
  • Simplifies architecture (fewer layers, clearer flow)

Dependencies#

  • turbopuffer - Episodic memory storage
  • openai - Embeddings for semantic search
  • fastmcp - MCP server/client
  • pydantic-ai - Agent framework
  • atproto (from git) - Bluesky protocol

Total codebase reduction: -2,720 lines of cruft removed! 🎉

Post-Refactor Improvements#

Session Persistence (Rate Limit Fix)#

After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence:

Before:

  • Every agent init → new authentication → hits rate limit fast
  • Tests would fail after 5 runs
  • Dev mode with --reload would fail after 10 code changes

After:

  • Session tokens saved to .session file
  • Tokens automatically refresh every ~2 hours
  • Only re-authenticates after ~2 months when refresh token expires
  • Tests reuse session across runs
  • Rate limits essentially eliminated

Implementation:

  • Added SessionEvent callback in atproto_client.py
  • Session automatically saved on CREATE and REFRESH events
  • Authentication tries session reuse before creating new session
  • Invalid sessions automatically cleaned up and recreated