at protocol indexer with flexible filtering, xrpc queries, and a cursor-backed event stream, built on fjall
at-protocol atproto indexer rust fjall

[docs] update agent instructions with latest architecture and style guide

ptr.pet 042fabda df488e88

verified
+15 -6
+15 -6
AGENTS.md
··· 35 35 36 36 Hydrant consists of several components: 37 37 - **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization. 38 - - **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently. Verifies signatures, updates repository state, detects gaps for backfill, and persists records. 38 + - **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently using sharded workers. Verifies signatures, updates repository state (handling account status events like deactivations), detects gaps for backfill, and persists records. 39 39 - **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode. 40 - - **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected. 40 + - **[`hydrant::resolver`]**: Manages DID resolution and key lookups. Supports multiple PLC directory sources with failover and caching. 41 + - **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files. Uses LIFO prioritization and adaptive concurrency to manage backfill load efficiently. 41 42 - **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a event stream API via WebSockets. 42 43 - **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state. 43 44 ··· 48 49 ## General conventions 49 50 50 51 ### Correctness over convenience 51 - - Model the full error space—no shortcuts or simplified error handling. 52 52 - Handle all edge cases, including race conditions in the ingestion buffer. 53 53 - Use the type system to encode correctness constraints. 54 54 - Prefer compile-time guarantees over runtime checks where possible. 55 + 56 + ### Error handling 57 + - **Typed Errors**: Define custom error enums (e.g. `ResolverError`, `IngestError`) when callers need to handle specific cases (like rate limits or retries). 58 + - **Diagnostics**: Use `miette::Report` embedded in a `Generic` variant for unexpected errors to maintain diagnostic context. 59 + - **Type Preservation**: Avoid erasing error types with `.into_diagnostic()` in valid code paths; only use it at the top-level application boundary or when the error is truly unrecoverable and needs no special handling. 55 60 56 61 ### Production-grade engineering 57 62 - Use `miette` for diagnostic-driven error reporting. ··· 69 74 70 75 Hydrant uses multiple `fjall` keyspaces: 71 76 - `repos`: Maps `{DID}` -> `RepoState` (MessagePack). 72 - - `records`: Maps `{DID}|{Collection}|{RKey}` -> `{CID}` (String). 77 + - `records`: Partitioned by collection. Maps `{DID}|{RKey}` -> `{CID}` (Binary). 73 78 - `blocks`: Maps `{CID}` -> `Block Data` (Raw CBOR). 74 79 - `events`: Maps `{ID}` (u64) -> `StoredEvent` (MessagePack). This is the source for the JSON stream API. 75 80 - `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (u64/i64 BE Bytes). 76 - - `pending`: Index of DIDs awaiting backfill. 81 + - `pending`: Queue of `{Timestamp}|{DID}` -> `Empty` (Backfill queue). 77 82 - `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones. 83 + - `resync_buffer`: Maps `{DID}|{Rev}` -> `Commit` (MessagePack). Used to buffer live events during backfill. 78 84 - `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes). 79 85 80 86 ## Safe commands 81 87 82 88 ### Testing 83 89 - `nu tests/repo_sync_integrity.nu` - Runs the full integration test suite using Nushell. This builds the binary, starts a temporary instance, performs a backfill against a real PDS, and verifies record integrity. 90 + - `nu tests/verify_crawler.nu` - Verifies full-network crawler functionality using a mock relay. 84 91 - `nu tests/stream_test.nu` - Tests WebSocket streaming functionality. Verifies both live event streaming during backfill and historical replay with cursor. 85 92 - `nu tests/authenticated_stream_test.nu` - Tests authenticated event streaming. Verifies that create, update, and delete actions on a real account are correctly streamed by Hydrant in the correct order. Requires `TEST_REPO` and `TEST_PASSWORD` in `.env`. 86 93 - `nu tests/debug_endpoints.nu` - Tests debug/introspection endpoints (`/debug/iter`, `/debug/get`) and verifies DB content and serialization. 87 94 88 95 ## Rust code style 89 96 90 - - Always try to use variable substitution in `format!` like macros (eg. logging macros like `info!`, `debug!`) like so: `format!("error: {err}")`. 97 + - Prefer variable substitution in `format!` like macros (eg. logging macros like `info!`, `debug!`) like so: `format!("error: {err}")`. 91 98 - Prefer using let-guard (eg. `let Some(val) = res else { continue; }`) over nested ifs where it makes sense (eg. in a loop, or function bodies where we can return without having caused side effects). 99 + - Prefer functional combinators over explicit matching when it improves readability (eg. `.then_some()`, `.map()`, `.ok_or_else()`). 100 + - Prefer iterator chains (`.filter_map()`, `.flat_map()`) over explicit loops for data transformation. 92 101 93 102 ## Commit message style 94 103