# QuickDID - Development Guide for Claude ## Overview QuickDID is a high-performance AT Protocol identity resolution service written in Rust. It provides bidirectional handle-to-DID and DID-to-handle resolution with multi-layer caching (Redis, SQLite, in-memory), queue processing, metrics support, proactive cache refreshing, and real-time cache updates via Jetstream consumer. ## Configuration QuickDID follows the 12-factor app methodology and uses environment variables exclusively for configuration. There are no command-line arguments except for `--version` and `--help`. Configuration is validated at startup, and the service will exit with specific error codes if validation fails: - `error-quickdid-config-1`: Missing required environment variable - `error-quickdid-config-2`: Invalid configuration value - `error-quickdid-config-3`: Invalid TTL value (must be positive) - `error-quickdid-config-4`: Invalid timeout value (must be positive) ## Common Commands ### Building and Running ```bash # Build the project cargo build # Run in debug mode (requires environment variables) HTTP_EXTERNAL=localhost:3007 cargo run # Run tests cargo test # Type checking cargo check # Linting cargo clippy # Show version cargo run -- --version # Show help cargo run -- --help ``` ### Development with VS Code The project includes a `.vscode/launch.json` configuration for debugging with Redis integration. Use the "Debug executable 'quickdid'" launch configuration. ## Architecture ### Core Components 1. **Handle Resolution** (`src/handle_resolver/`) - `BaseHandleResolver`: Core resolution using DNS and HTTP - `RateLimitedHandleResolver`: Semaphore-based rate limiting with optional timeout - `CachingHandleResolver`: In-memory caching layer with bidirectional support - `RedisHandleResolver`: Redis-backed persistent caching with bidirectional lookups - `SqliteHandleResolver`: SQLite-backed persistent caching with bidirectional support - `ProactiveRefreshResolver`: Automatically refreshes cache entries before expiration - All resolvers implement `HandleResolver` trait with: - `resolve`: Handle-to-DID resolution - `purge`: Remove entries by handle or DID - `set`: Manually update handle-to-DID mappings - Uses binary serialization via `HandleResolutionResult` for space efficiency - Resolution stack: Cache → ProactiveRefresh (optional) → RateLimited (optional) → Base → DNS/HTTP - Includes resolution timing measurements for metrics 2. **Binary Serialization** (`src/handle_resolution_result.rs`) - Compact storage format using bincode - Strips DID prefixes for did:web and did:plc methods - Stores: timestamp (u64), method type (i16), payload (String) 3. **Queue System** (`src/queue/`) - Supports MPSC (in-process), Redis, SQLite, and no-op adapters - `HandleResolutionWork` items processed asynchronously - Redis uses reliable queue pattern (LPUSH/RPOPLPUSH/LREM) - SQLite provides persistent queue with work shedding capabilities 4. **HTTP Server** (`src/http/`) - XRPC endpoints for AT Protocol compatibility - Health check endpoint - Static file serving from configurable directory (default: www) - Serves .well-known files as static content - CORS headers support for cross-origin requests - Cache-Control headers with configurable max-age and stale directives - ETag support with configurable seed for cache invalidation 5. **Metrics System** (`src/metrics.rs`) - Pluggable metrics publishing with StatsD support - Tracks counters, gauges, and timings - Configurable tags for environment/service identification - No-op adapter for development environments - Metrics for Jetstream event processing 6. **Jetstream Consumer** (`src/jetstream_handler.rs`) - Consumes AT Protocol firehose events via WebSocket - Processes Account events (purges deleted/deactivated accounts) - Processes Identity events (updates handle-to-DID mappings) - Automatic reconnection with exponential backoff - Comprehensive metrics for event processing - Spawned as cancellable task using task manager ## Key Technical Details ### DID Method Types - `did:web`: Web-based DIDs, prefix stripped for storage - `did:plc`: PLC directory DIDs, prefix stripped for storage - Other DID methods stored with full identifier ### Redis Integration - **Bidirectional Caching**: - Stores both handle→DID and DID→handle mappings - Uses MetroHash64 for key generation - Binary data storage for efficiency - Automatic synchronization of both directions - **Queuing**: Reliable queue with processing/dead letter queues - **Key Prefixes**: Configurable via `QUEUE_REDIS_PREFIX` environment variable ### Handle Resolution Flow 1. Check cache (Redis/SQLite/in-memory based on configuration) 2. If cache miss and rate limiting enabled: - Acquire semaphore permit (with optional timeout) - If timeout configured and exceeded, return error 3. Perform DNS TXT lookup or HTTP well-known query 4. Cache result with appropriate TTL in both directions (handle→DID and DID→handle) 5. Return DID or error ### Cache Management Operations - **Purge**: Removes entries by either handle or DID - Uses `atproto_identity::resolve::parse_input` for identifier detection - Removes both handle→DID and DID→handle mappings - Chains through all resolver layers - **Set**: Manually updates handle-to-DID mappings - Updates both directions in cache - Normalizes handles to lowercase - Chains through all resolver layers ## Environment Variables ### Required - `HTTP_EXTERNAL`: External hostname for service endpoints (e.g., `localhost:3007`) ### Optional - Core Configuration - `HTTP_PORT`: Server port (default: 8080) - `PLC_HOSTNAME`: PLC directory hostname (default: plc.directory) - `RUST_LOG`: Logging level (e.g., debug, info) - `STATIC_FILES_DIR`: Directory for serving static files (default: www) ### Optional - Caching - `REDIS_URL`: Redis connection URL for caching - `SQLITE_URL`: SQLite database URL for caching (e.g., `sqlite:./quickdid.db`) - `CACHE_TTL_MEMORY`: TTL for in-memory cache in seconds (default: 600) - `CACHE_TTL_REDIS`: TTL for Redis cache in seconds (default: 7776000) - `CACHE_TTL_SQLITE`: TTL for SQLite cache in seconds (default: 7776000) ### Optional - Queue Configuration - `QUEUE_ADAPTER`: Queue type - 'mpsc', 'redis', 'sqlite', 'noop', or 'none' (default: mpsc) - `QUEUE_REDIS_PREFIX`: Redis key prefix for queues (default: queue:handleresolver:) - `QUEUE_WORKER_ID`: Worker ID for queue operations (default: worker1) - `QUEUE_BUFFER_SIZE`: Buffer size for MPSC queue (default: 1000) - `QUEUE_SQLITE_MAX_SIZE`: Max queue size for SQLite work shedding (default: 10000) - `QUEUE_REDIS_TIMEOUT`: Redis blocking timeout in seconds (default: 5) - `QUEUE_REDIS_DEDUP_ENABLED`: Enable queue deduplication to prevent duplicate handles (default: false) - `QUEUE_REDIS_DEDUP_TTL`: TTL for deduplication keys in seconds (default: 60) ### Optional - Rate Limiting - `RESOLVER_MAX_CONCURRENT`: Maximum concurrent handle resolutions (default: 0 = disabled) - `RESOLVER_MAX_CONCURRENT_TIMEOUT_MS`: Timeout for acquiring rate limit permit in ms (default: 0 = no timeout) ### Optional - HTTP Cache Control - `CACHE_MAX_AGE`: Max-age for Cache-Control header in seconds (default: 86400) - `CACHE_STALE_IF_ERROR`: Stale-if-error directive in seconds (default: 172800) - `CACHE_STALE_WHILE_REVALIDATE`: Stale-while-revalidate directive in seconds (default: 86400) - `CACHE_MAX_STALE`: Max-stale directive in seconds (default: 86400) - `ETAG_SEED`: Seed value for ETag generation (default: application version) ### Optional - Metrics - `METRICS_ADAPTER`: Metrics adapter type - 'noop' or 'statsd' (default: noop) - `METRICS_STATSD_HOST`: StatsD host and port (required when METRICS_ADAPTER=statsd, e.g., localhost:8125) - `METRICS_STATSD_BIND`: Bind address for StatsD UDP socket (default: [::]:0 for IPv6, can use 0.0.0.0:0 for IPv4) - `METRICS_PREFIX`: Prefix for all metrics (default: quickdid) - `METRICS_TAGS`: Comma-separated tags (e.g., env:prod,service:quickdid) ### Optional - Proactive Refresh - `PROACTIVE_REFRESH_ENABLED`: Enable proactive cache refreshing (default: false) - `PROACTIVE_REFRESH_THRESHOLD`: Refresh when TTL remaining is below this threshold (0.0-1.0, default: 0.8) ### Optional - Jetstream Consumer - `JETSTREAM_ENABLED`: Enable Jetstream consumer for real-time cache updates (default: false) - `JETSTREAM_HOSTNAME`: Jetstream WebSocket hostname (default: jetstream.atproto.tools) ## Error Handling All error strings must use this format: error-quickdid-- :
Current error domains and examples: * `config`: Configuration errors (e.g., error-quickdid-config-1 Missing required environment variable) * `resolve`: Handle resolution errors (e.g., error-quickdid-resolve-1 Failed to resolve subject) * `queue`: Queue operation errors (e.g., error-quickdid-queue-1 Failed to push to queue) * `cache`: Cache-related errors (e.g., error-quickdid-cache-1 Redis pool creation failed) * `result`: Serialization errors (e.g., error-quickdid-result-1 System time error) * `task`: Task processing errors (e.g., error-quickdid-task-1 Queue adapter health check failed) Errors should be represented as enums using the `thiserror` library. Avoid creating new errors with the `anyhow!(...)` or `bail!(...)` macro. ## Testing ### Running Tests ```bash # Run all tests cargo test # Run with Redis integration tests TEST_REDIS_URL=redis://localhost:6379 cargo test # Run specific test module cargo test handle_resolver::tests ``` ### Test Coverage Areas - Handle resolution with various DID methods - Binary serialization/deserialization - Redis caching and expiration with bidirectional lookups - Queue processing logic - HTTP endpoint responses - Jetstream event handler processing - Purge and set operations across resolver layers ## Development Patterns ### Error Handling - Uses strongly-typed errors with `thiserror` for all modules - Each error has a unique identifier following the pattern `error-quickdid--` - Graceful fallbacks when Redis/SQLite is unavailable - Detailed tracing for debugging - Avoid using `anyhow!()` or `bail!()` macros - use proper error types instead ### Performance Optimizations - Binary serialization reduces storage by ~40% - MetroHash64 for fast key generation - Connection pooling for Redis - Configurable TTLs for cache entries - Rate limiting via semaphore-based concurrency control - HTTP caching with ETag and Cache-Control headers - Resolution timing metrics for performance monitoring ### Code Style - Follow existing Rust idioms and patterns - Use `tracing` for logging, not `println!` - Prefer `Arc` for shared state across async tasks - Handle errors explicitly, avoid `.unwrap()` in production code - Use `httpdate` crate for HTTP date formatting (not `chrono`) ## Common Tasks ### Adding a New DID Method 1. Update `DidMethodType` enum in `handle_resolution_result.rs` 2. Modify `parse_did()` and `to_did()` methods 3. Add test cases for the new method type ### Modifying Cache TTL - For in-memory: Set `CACHE_TTL_MEMORY` environment variable - For Redis: Set `CACHE_TTL_REDIS` environment variable - For SQLite: Set `CACHE_TTL_SQLITE` environment variable ### Configuring Metrics 1. Set `METRICS_ADAPTER=statsd` and `METRICS_STATSD_HOST=localhost:8125` 2. Configure tags with `METRICS_TAGS=env:prod,service:quickdid` 3. Use Telegraf + TimescaleDB for aggregation (see `docs/telegraf-timescaledb-metrics-guide.md`) 4. Railway deployment resources available in `railway-resources/telegraf/` ### Debugging Resolution Issues 1. Enable debug logging: `RUST_LOG=debug` 2. Check Redis cache: - Handle lookup: `redis-cli GET "handle:"` - DID lookup: `redis-cli GET "handle:"` (same key format) 3. Check SQLite cache: `sqlite3 quickdid.db "SELECT * FROM handle_resolution_cache;"` 4. Monitor queue processing in logs 5. Check rate limiting: Look for "Rate limit permit acquisition timed out" errors 6. Verify DNS/HTTP connectivity to AT Protocol infrastructure 7. Monitor metrics for resolution timing and cache hit rates 8. Check Jetstream consumer status: - Look for "Jetstream consumer" log entries - Monitor `jetstream.*` metrics - Check reconnection attempts in logs ## Dependencies - `atproto-identity`: Core AT Protocol identity resolution - `atproto-jetstream`: AT Protocol Jetstream event consumer - `bincode`: Binary serialization - `deadpool-redis`: Redis connection pooling - `metrohash`: Fast non-cryptographic hashing - `tokio`: Async runtime - `axum`: Web framework - `httpdate`: HTTP date formatting (replacing chrono) - `cadence`: StatsD metrics client - `thiserror`: Error handling