Running an ATProto Relay for ATCR Hold Discovery#

This document explains what it takes to run an ATProto relay for indexing ATCR hold records, including infrastructure requirements, configuration, and trade-offs.

Overview#

What is an ATProto Relay?#

An ATProto relay is a service that:

Subscribes to multiple PDS hosts and aggregates their data streams
Outputs a combined "firehose" event stream for real-time network updates
Validates data integrity and identity signatures
Provides discovery endpoints like com.atproto.sync.listReposByCollection

The relay acts as a network-wide indexer, making it possible to discover which DIDs have records of specific types (collections).

Why ATCR Needs a Relay#

ATCR uses hold captain records (io.atcr.hold.captain) stored in hold PDSs to enable hold discovery. The listReposByCollection endpoint allows AppViews to efficiently discover all holds in the network without crawling every PDS individually.

The problem: Standard Bluesky relays appear to only index collections from did:plc DIDs, not did:web DIDs. Since ATCR holds use did:web (e.g., did:web:hold01.atcr.io), they aren't discoverable via Bluesky's public relays.

Recommended Approach: Phased Implementation#

ATCR's discovery needs evolve as the network grows. Start simple, scale as needed.

MVP: Minimal Discovery Service#

For initial deployment with a small number of holds (dozens, not thousands), build a lightweight custom discovery service focused solely on io.atcr.* collections.

Why Minimal Service for MVP?#

Scope: Only index io.atcr.* collections (manifests, tags, captain/crew, sailor profiles)
Opt-in: Only crawls PDSs that explicitly call requestCrawl
Small scale: Dozens of holds, not millions of users
Simple storage: SQLite sufficient for current scale
Cost-effective: $5-10/month VPS

Architecture#

Inbound endpoints:

POST /xrpc/com.atproto.sync.requestCrawl
  → Hold registers itself for crawling

GET /xrpc/com.atproto.sync.listReposByCollection?collection=io.atcr.hold.captain
  → AppView discovers holds

Outbound (client to PDS):

1. com.atproto.repo.describeRepo → verify PDS exists
2. com.atproto.sync.getRepo → fetch full CAR file (initial backfill)
3. com.atproto.sync.subscribeRepos → WebSocket for real-time updates
4. Parse events → extract io.atcr.* records → index in SQLite

Data flow:

Initial crawl (on requestCrawl):

1. Hold POSTs requestCrawl → service queues crawl job
2. Service fetches getRepo (CAR file) from hold's PDS for backfill
3. Service parses CAR using indigo libraries
4. Service extracts io.atcr.* records (captain, crew, manifests, etc.)
5. Service stores: (did, collection, rkey, record_data) in SQLite
6. Service opens WebSocket to subscribeRepos for this DID
7. Service stores cursor for reconnection handling

Ongoing updates (WebSocket):

1. Receive commit events via subscribeRepos WebSocket
2. Parse event, filter to io.atcr.* collections only
3. Update indexed_records incrementally (insert/update/delete)
4. Update cursor after processing each event
5. On disconnect: reconnect with stored cursor to resume

Discovery (AppView query):

1. AppView GETs listReposByCollection?collection=io.atcr.hold.captain
2. Service queries SQLite WHERE collection='io.atcr.hold.captain'
3. Service returns list of DIDs with that collection

Implementation Requirements#

Technologies:

Go (reuse indigo libraries for CAR parsing and WebSocket)
SQLite (sufficient for dozens/hundreds of holds)
Standard HTTP server + WebSocket client

Core components:

HTTP handlers (cmd/atcr-discovery/handlers/):
- requestCrawl - queue crawl jobs
- listReposByCollection - query indexed collections
Crawler (pkg/discovery/crawler.go):
- Fetch CAR files from PDSs for initial backfill
- Parse with github.com/bluesky-social/indigo/repo
- Extract records, filter to io.atcr.* only
WebSocket subscriber (pkg/discovery/subscriber.go):
- WebSocket client for com.atproto.sync.subscribeRepos
- Event parsing and filtering
- Cursor management and persistence
- Automatic reconnection with resume
Storage (pkg/discovery/storage.go):
- SQLite schema for indexed records
- Indexes on (collection, did) for fast queries
- Cursor storage for reconnection
Worker (pkg/discovery/worker.go):
- Background crawl job processor
- WebSocket connection manager
- Health monitoring for subscriptions

Database schema:

CREATE TABLE indexed_records (
    did TEXT NOT NULL,
    collection TEXT NOT NULL,
    rkey TEXT NOT NULL,
    record_data TEXT NOT NULL, -- JSON
    indexed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (did, collection, rkey)
);

CREATE INDEX idx_collection ON indexed_records(collection);
CREATE INDEX idx_did ON indexed_records(did);

CREATE TABLE crawl_queue (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    hostname TEXT NOT NULL UNIQUE,
    did TEXT,
    status TEXT DEFAULT 'pending', -- pending, in_progress, subscribed, failed
    last_crawled_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE subscriptions (
    did TEXT PRIMARY KEY,
    hostname TEXT NOT NULL,
    cursor INTEGER, -- Last processed sequence number
    status TEXT DEFAULT 'active', -- active, disconnected, failed
    last_event_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Leveraging indigo libraries:

import (
    "github.com/bluesky-social/indigo/repo"
    "github.com/bluesky-social/indigo/atproto/syntax"
    "github.com/bluesky-social/indigo/events"
    "github.com/gorilla/websocket"
    "github.com/ipfs/go-cid"
)

// Initial backfill: Parse CAR file
r, err := repo.ReadRepoFromCar(ctx, bytes.NewReader(carData))
if err != nil {
    return err
}

// Iterate records
err = r.ForEach(ctx, "", func(path string, nodeCid cid.Cid) error {
    // Parse collection from path (e.g., "io.atcr.hold.captain/self")
    parts := strings.Split(path, "/")
    if len(parts) != 2 {
        return nil // skip invalid paths
    }

    collection := parts[0]
    rkey := parts[1]

    // Filter to io.atcr.* only
    if !strings.HasPrefix(collection, "io.atcr.") {
        return nil
    }

    // Get record data
    recordBytes, err := r.GetRecord(ctx, path)
    if err != nil {
        return err
    }

    // Store in database
    return store.IndexRecord(did, collection, rkey, recordBytes)
})

// WebSocket subscription: Listen for updates
wsURL := fmt.Sprintf("wss://%s/xrpc/com.atproto.sync.subscribeRepos", hostname)
conn, _, err := websocket.DefaultDialer.Dial(wsURL, nil)
if err != nil {
    return err
}

// Read events
rsc := &events.RepoStreamCallbacks{
    RepoCommit: func(evt *events.RepoCommit) error {
        // Filter to io.atcr.* collections only
        for _, op := range evt.Ops {
            if !strings.HasPrefix(op.Collection, "io.atcr.") {
                continue
            }

            // Process create/update/delete operations
            switch op.Action {
            case "create", "update":
                store.IndexRecord(evt.Repo, op.Collection, op.Rkey, op.Record)
            case "delete":
                store.DeleteRecord(evt.Repo, op.Collection, op.Rkey)
            }
        }

        // Update cursor
        return store.UpdateCursor(evt.Repo, evt.Seq)
    },
}

// Process stream
scheduler := events.NewScheduler("discovery-worker", conn.RemoteAddr().String(), rsc)
return events.HandleRepoStream(ctx, conn, scheduler)

Infrastructure Requirements#

Minimum specs:

1 vCPU
1-2GB RAM
20GB SSD
Minimal bandwidth (<1GB/day for dozens of holds)

Estimated cost:

Hetzner CX11: €4.15/month (~$5/month)
DigitalOcean Basic: $6/month
Fly.io: ~$5-10/month

Deployment:

# Build
go build -o atcr-discovery ./cmd/atcr-discovery

# Run
export DATABASE_PATH="/var/lib/atcr-discovery/discovery.db"
export HTTP_ADDR=":8080"
./atcr-discovery

Limitations#

What it does NOT do:

❌ Serve outbound subscribeRepos firehose (AppViews query via listReposByCollection)
❌ Full MST validation (trust PDS validation)
❌ Scale to millions of accounts (SQLite limits)
❌ Multi-instance deployment (single process with SQLite)

When to migrate to full relay: When you have 1000+ holds, need PostgreSQL, or multi-instance deployment.

Future Scale: Full Relay (Sync v1.1)#

When ATCR grows beyond dozens of holds and needs real-time indexing, migrate to Bluesky's relay v1.1 implementation.

When to Upgrade#

Indicators:

100+ holds requesting frequent crawls
Need real-time updates (re-crawl latency too high)
Multiple AppView instances need coordinated discovery
SQLite performance becomes bottleneck

Relay v1.1 Characteristics#

Released May 2025, this is Bluesky's current reference implementation.

Key features:

Non-archival: Doesn't mirror full repository data, only processes firehose
WebSocket subscriptions: Real-time updates from PDSs
Scalable: 2 vCPU, 12GB RAM handles ~100M accounts
PostgreSQL: Required for production scale
Admin UI: Web dashboard for management

Source: github.com/bluesky-social/indigo/cmd/relay

Migration Path#

Step 1: Deploy relay v1.1

git clone https://github.com/bluesky-social/indigo.git
cd indigo
go build -o relay ./cmd/relay

export DATABASE_URL="postgres://relay:password@localhost:5432/atcr_relay"
./relay --admin-password="secure-password"

Step 2: Migrate data

Export indexed records from SQLite
Trigger crawls in relay for all known holds
Verify relay indexes correctly

Step 3: Update AppView configuration

# Point to new relay
export ATCR_RELAY_ENDPOINT="https://relay.atcr.io"

Step 4: Decommission minimal service

Monitor relay for stability
Shut down old discovery service

Infrastructure Requirements (Full Relay)#

Minimum specs:

2 vCPU cores
12GB RAM
100GB SSD
30 Mbps bandwidth

Estimated cost:

Hetzner: ~$30-40/month
DigitalOcean: ~$50/month (with managed PostgreSQL)
Fly.io: ~$35-50/month

Collection Indexing: The `collectiondir` Microservice#

The com.atproto.sync.listReposByCollection endpoint is not part of the relay core. It's provided by a separate microservice called collectiondir.

What is collectiondir?#

Separate service that indexes collections for efficient discovery
Optional: Not required by the ATProto spec, but very useful for AppViews
Deployed alongside relay by Bluesky's public instances

Current Limitation: did:plc Only?#

Based on testing, Bluesky's public relays (with collectiondir) appear to:

✅ Index io.atcr.* collections from did:plc DIDs
❌ NOT index io.atcr.* collections from did:web DIDs

This means:

ATCR manifests from users (did:plc) are discoverable
ATCR hold captain records (did:web) are NOT discoverable
The relay still stores all data (CAR file includes did:web records)
The issue is specifically with indexing for listReposByCollection

Configuring collectiondir#

Documentation on configuring collectiondir is sparse. Possible approaches:

Fork and modify: Clone indigo repo, modify collectiondir to index all DIDs
Configuration file: Check if collectiondir accepts whitelist/configuration for indexed collections
No filtering: Default behavior might be to index everything, but Bluesky's deployment filters

Action item: Review indigo/cmd/collectiondir source code to understand configuration options.

Multi-Relay Strategy#

Holds can request crawls from multiple relays simultaneously. This enables:

Scenario: Bluesky + ATCR Relays#

Setup:

Hold deploys with embedded PDS at did:web:hold01.atcr.io
Hold creates captain record (io.atcr.hold.captain/self)
Hold requests crawl from both:
- Bluesky relay: https://bsky.network/xrpc/com.atproto.sync.requestCrawl
- ATCR relay: https://relay.atcr.io/xrpc/com.atproto.sync.requestCrawl

Result:

✅ Bluesky relay indexes social posts (if hold owner posts)
✅ ATCR relay indexes hold captain records
✅ AppViews query ATCR relay for hold discovery
✅ Independent networks - Bluesky posts work regardless of ATCR relay

Request Crawl Script#

The existing script can be modified to support multiple relays:

#!/bin/bash
# deploy/request-crawl.sh

HOSTNAME=$1
BLUESKY_RELAY=${2:-"https://bsky.network"}
ATCR_RELAY=${3:-"https://relay.atcr.io"}

echo "Requesting crawl for $HOSTNAME from Bluesky relay..."
curl -X POST "$BLUESKY_RELAY/xrpc/com.atproto.sync.requestCrawl" \
  -H "Content-Type: application/json" \
  -d "{\"hostname\": \"$HOSTNAME\"}"

echo "Requesting crawl for $HOSTNAME from ATCR relay..."
curl -X POST "$ATCR_RELAY/xrpc/com.atproto.sync.requestCrawl" \
  -H "Content-Type: application/json" \
  -d "{\"hostname\": \"$HOSTNAME\"}"

Usage:

./deploy/request-crawl.sh hold01.atcr.io

Deployment: Minimal Discovery Service#

1. Infrastructure Setup#

Provision VPS:

Hetzner CX11, DigitalOcean Basic, or Fly.io
Public domain (e.g., discovery.atcr.io)
TLS certificate (Let's Encrypt)

Configure reverse proxy (optional - nginx):

upstream discovery {
    server 127.0.0.1:8080;
}

server {
    listen 443 ssl http2;
    server_name discovery.atcr.io;

    ssl_certificate /etc/letsencrypt/live/discovery.atcr.io/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/discovery.atcr.io/privkey.pem;

    location / {
        proxy_pass http://discovery;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

2. Build and Deploy#

# Clone ATCR repo
git clone https://github.com/atcr-io/atcr.git
cd atcr

# Build discovery service
go build -o atcr-discovery ./cmd/atcr-discovery

# Run
export DATABASE_PATH="/var/lib/atcr-discovery/discovery.db"
export HTTP_ADDR=":8080"
export CRAWL_INTERVAL="12h"
./atcr-discovery

3. Update Hold Startup#

Each hold should request crawl on startup:

# In hold startup script or environment
export ATCR_DISCOVERY_URL="https://discovery.atcr.io"

# Request crawl from both Bluesky and ATCR
curl -X POST "https://bsky.network/xrpc/com.atproto.sync.requestCrawl" \
  -H "Content-Type: application/json" \
  -d "{\"hostname\": \"$HOLD_PUBLIC_URL\"}"

curl -X POST "$ATCR_DISCOVERY_URL/xrpc/com.atproto.sync.requestCrawl" \
  -H "Content-Type: application/json" \
  -d "{\"hostname\": \"$HOLD_PUBLIC_URL\"}"

4. Update AppView Configuration#

Point AppView discovery worker to the discovery service:

# In .env.appview or environment
export ATCR_RELAY_ENDPOINT="https://discovery.atcr.io"
export ATCR_HOLD_DISCOVERY_ENABLED="true"
export ATCR_HOLD_DISCOVERY_INTERVAL="6h"

5. Monitor and Maintain#

Monitoring:

Check crawl queue status
Monitor SQLite database size
Track failed crawls

Maintenance:

Re-crawl on schedule (every 6-24 hours)
Prune stale records (>7 days old)
Backup SQLite database regularly

Trade-Offs and Considerations#

Running Your Own Relay#

Pros:

✅ Full control over indexing (can index did:web holds)
✅ No dependency on third-party relay policies
✅ Can customize collection filters for ATCR-specific needs
✅ Relatively lightweight with modern relay implementation

Cons:

❌ Infrastructure cost (~$30-50/month minimum)
❌ Operational overhead (monitoring, updates, backups)
❌ Need to maintain as network grows
❌ Single point of failure for discovery (unless multi-relay)

Alternatives to Running a Relay#

1. Direct Registration API#

Holds POST to AppView on startup to register themselves:

Pros:

✅ Simplest implementation
✅ No relay infrastructure needed
✅ Immediate registration (no crawl delay)

Cons:

❌ Ties holds to specific AppView instances
❌ Breaks decentralized discovery model
❌ Each AppView has different hold registry

2. Static Discovery File#

Maintain https://atcr.io/.well-known/holds.json:

Pros:

✅ No infrastructure beyond static hosting
✅ All AppViews share same registry
✅ Simple to implement

Cons:

❌ Manual process (PRs/issues to add holds)
❌ Not real-time discovery
❌ Centralized control point

3. Hybrid Approach#

Combine multiple discovery mechanisms:

func (w *HoldDiscoveryWorker) DiscoverHolds(ctx context.Context) error {
    // 1. Fetch static registry
    staticHolds := w.fetchStaticRegistry()

    // 2. Query relay (if available)
    relayHolds := w.queryRelay(ctx)

    // 3. Accept direct registrations
    registeredHolds := w.getDirectRegistrations()

    // Merge and deduplicate
    allHolds := mergeHolds(staticHolds, relayHolds, registeredHolds)

    // Cache in database
    for _, hold := range allHolds {
        w.cacheHold(hold)
    }
}

Pros:

✅ Multiple discovery paths (resilient)
✅ Gradual migration to relay-based discovery
✅ Supports both centralized bootstrap and decentralized growth

Cons:

❌ More complex implementation
❌ Potential for stale data if sources conflict

Recommendations for ATCR#

Phase 1: MVP (Now - 1000 holds)#

Build minimal discovery service with WebSocket (~$5-10/month):

Implement requestCrawl + listReposByCollection endpoints
Initial backfill via getRepo (CAR file parsing)
Real-time updates via WebSocket subscribeRepos
SQLite storage with cursor management
Filter to io.atcr.* collections only

Deliverables:

cmd/atcr-discovery service
SQLite schema with cursor storage
CAR file parser (indigo libraries)
WebSocket subscriber with reconnection
Deployment scripts

Cost: ~$5-10/month VPS

Why: Minimal infrastructure, real-time updates, full control over indexing, sufficient for hundreds of holds.

Phase 2: Migrate to Full Relay (1000+ holds)#

Deploy Bluesky relay v1.1 when scaling needed (~$30-50/month):

Set up PostgreSQL database
Deploy indigo relay with admin UI
Migrate indexed data from SQLite
Configure for io.atcr.* collection filtering (if possible)
Handle thousands of concurrent WebSocket connections

Cost: ~$30-50/month

Why: Proven scalability to 100M+ accounts, standardized protocol, community support, production-ready infrastructure.

Phase 3: Multi-Relay Federation (Future)#

Decentralized relay network:

Multiple ATCR relays operated independently
AppViews query multiple relays (fallback/redundancy)
Holds request crawls from all known ATCR relays
Cross-relay synchronization (optional)

Why: No single point of failure, fully decentralized discovery, geographic distribution.

Next Steps#

For MVP Implementation#

Create cmd/atcr-discovery package structure
- HTTP handlers for XRPC endpoints (requestCrawl, listReposByCollection)
- Crawler with indigo CAR parsing for initial backfill
- WebSocket subscriber for real-time updates
- SQLite storage layer with cursor management
- Background worker for managing subscriptions
Database schema
- indexed_records table for collection data
- crawl_queue table for crawl job management
- subscriptions table for WebSocket cursor tracking
- Indexes for efficient queries
WebSocket implementation
- Use github.com/bluesky-social/indigo/events for event handling
- Implement reconnection logic with cursor resume
- Filter events to io.atcr.* collections only
- Health monitoring for active subscriptions
Testing strategy
- Unit tests for CAR parsing
- Unit tests for event filtering
- Integration tests with mock PDSs and WebSocket
- Connection failure and reconnection testing
- Load testing with SQLite
Deployment
- Dockerfile for discovery service
- Deployment scripts (systemd, docker-compose)
- Monitoring setup (logs, metrics, WebSocket health)
- Alert on subscription failures
Documentation
- API documentation for XRPC endpoints
- Deployment guide
- Troubleshooting guide (WebSocket connection issues)

Open Questions#

CAR parsing edge cases: How to handle malformed CAR files or invalid records?
WebSocket reconnection: What's the optimal backoff strategy for reconnection attempts?
Subscription management: How many concurrent WebSocket connections can SQLite handle?
Rate limiting: Should discovery service rate-limit requestCrawl to prevent abuse?
Authentication: Should requestCrawl require authentication, or remain open?
Cursor storage: Should cursors be persisted immediately or batched for performance?
Monitoring: What metrics are most important for operational visibility (active subs, event rate, lag)?
Error handling: When a WebSocket dies, should we re-backfill via getRepo or trust cursor resume?

Running an ATProto Relay for ATCR Hold Discovery#

Overview#

What is an ATProto Relay?#

Why ATCR Needs a Relay#

Recommended Approach: Phased Implementation#

MVP: Minimal Discovery Service#

Why Minimal Service for MVP?#

Architecture#

Implementation Requirements#

Infrastructure Requirements#

Limitations#

Future Scale: Full Relay (Sync v1.1)#

When to Upgrade#

Relay v1.1 Characteristics#

Migration Path#

Infrastructure Requirements (Full Relay)#

Collection Indexing: The collectiondir Microservice#

What is collectiondir?#

Current Limitation: did:plc Only?#

Configuring collectiondir#

Multi-Relay Strategy#

Scenario: Bluesky + ATCR Relays#

Request Crawl Script#

Deployment: Minimal Discovery Service#

1. Infrastructure Setup#

2. Build and Deploy#

3. Update Hold Startup#

4. Update AppView Configuration#

5. Monitor and Maintain#

Trade-Offs and Considerations#

Running Your Own Relay#

Alternatives to Running a Relay#

1. Direct Registration API#

2. Static Discovery File#

3. Hybrid Approach#

Recommendations for ATCR#

Phase 1: MVP (Now - 1000 holds)#

Phase 2: Migrate to Full Relay (1000+ holds)#

Phase 3: Multi-Relay Federation (Future)#

Next Steps#

For MVP Implementation#

Open Questions#

References#

ATProto Specifications#

Indigo Libraries#

Relay Reference (Future)#

Collection Indexing: The `collectiondir` Microservice#