at protocol indexer with flexible filtering, xrpc queries, and a cursor-backed event stream, built on fjall
at-protocol atproto indexer rust fjall

[docs] update agents.md

ptr.pet 68107d5a 90e4116e

verified
+17 -18
+17 -18
AGENTS.md
··· 22 22 23 23 ## Project overview 24 24 25 - Hydrant is an AT Protocol indexer built on the `fjall` LSM-tree engine. It supports both full-network indexing and efficient targeted indexing (filtered by DID), while maintaining full Firehose compatibility. 25 + Hydrant is an AT Protocol indexer built on the `fjall` database. It supports both full-network indexing and filtered indexing (eg. by DID). 26 26 27 27 Key design goals: 28 28 - Ingestion via the `fjall` storage engine. 29 29 - Content-Addressable Storage (CAS) for IPLD blocks. 30 30 - Reliable backfill mechanism with buffered live-event replay. 31 31 - Efficient binary storage using MessagePack (`rmp-serde`). 32 - - Native integration with the `jacquard` suite of ATProto crates. 32 + - Uses `jacquard` suite of ATProto crates. 33 33 34 34 ## System architecture 35 35 36 - Hydrant consists of several concurrent components: 37 - - **Ingestor**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization. 38 - - **Crawler**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode. 39 - - **Backfill worker**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected. 40 - - **API server**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a TAP-compatible JSON stream API via WebSockets. 41 - - **Persistence worker**: Manages periodic background flushes of the LSM-tree and cursor state. 36 + Hydrant consists of several components: 37 + - **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization. 38 + - **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently. Verifies signatures, updates repository state, detects gaps for backfill, and persists records. 39 + - **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode. 40 + - **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected. 41 + - **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a event stream API via WebSockets. 42 + - **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state. 42 43 43 44 ### Lazy event inflation 45 + 44 46 To minimize latency in `apply_commit` and the backfill worker, events are stored in a compact `StoredEvent` format. The expansion into full TAP-compatible JSON (including fetching record content from the CAS and DAG-CBOR parsing) is performed lazily within the WebSocket stream handler. 45 47 46 48 ## General conventions ··· 52 54 - Prefer compile-time guarantees over runtime checks where possible. 53 55 54 56 ### Production-grade engineering 55 - - Use `miette` for rich, diagnostic-driven error reporting. 57 + - Use `miette` for diagnostic-driven error reporting. 56 58 - Implement exhaustive integration tests that simulate full backfill cycles. 57 59 - Adhere to lowercase comments and sentence case in documentation. 58 60 - Avoid unnecessary comments if the code is self-documenting. ··· 60 62 ### Storage and serialization 61 63 - **State**: Use `rmp-serde` (MessagePack) for all internal state (`RepoState`, `ErrorState`, `StoredEvent`). 62 64 - **Blocks**: Store IPLD blocks as raw DAG-CBOR bytes in the CAS. This avoids expensive transcoding and allows direct serving of block content. 63 - - **Cursors**: Store cursors as plain UTF-8 strings for visibility and manual debugging. 65 + - **Cursors**: Store cursors as big-endian bytes (`u64`/`i64`). 64 66 - **Keyspaces**: Use the `keys.rs` module to maintain consistent composite key formats. 65 67 66 68 ## Database schema (keyspaces) 67 69 68 70 Hydrant uses multiple `fjall` keyspaces: 69 71 - `repos`: Maps `{DID}` -> `RepoState` (MessagePack). 70 - - `records`: Maps `{DID}\x00{Collection}\x00{RKey}` -> `{CID}` (String). 72 + - `records`: Maps `{DID}|{Collection}|{RKey}` -> `{CID}` (String). 71 73 - `blocks`: Maps `{CID}` -> `Block Data` (Raw CBOR). 72 74 - `events`: Maps `{ID}` (u64) -> `StoredEvent` (MessagePack). This is the source for the JSON stream API. 73 - - `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (String). 75 + - `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (u64/i64 BE Bytes). 74 76 - `pending`: Index of DIDs awaiting backfill. 75 - - `errors`: Maps `{DID}` -> `ErrorState` (MessagePack) for retry logic. 76 - - `buffer`: Maps `{DID}\x00{SEQ}` -> `Buffered Commit` (MessagePack). 77 + - `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones. 78 + - `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes). 77 79 78 80 ## Safe commands 79 81 80 - ### Compilation and linting 81 - - `cargo check` - fast validation of changes. 82 - - `cargo clippy` - ensure idiomatic Rust code. 83 - 84 82 ### Testing 85 83 - `nu tests/repo_sync_integrity.nu` - Runs the full integration test suite using Nushell. This builds the binary, starts a temporary instance, performs a backfill against a real PDS, and verifies record integrity. 86 84 - `nu tests/stream_test.nu` - Tests WebSocket streaming functionality. Verifies both live event streaming during backfill and historical replay with cursor. 87 85 - `nu tests/authenticated_stream_test.nu` - Tests authenticated event streaming. Verifies that create, update, and delete actions on a real account are correctly streamed by Hydrant in the correct order. Requires `TEST_REPO` and `TEST_PASSWORD` in `.env`. 86 + - `nu tests/debug_endpoints.nu` - Tests debug/introspection endpoints (`/debug/iter`, `/debug/get`) and verifies DB content and serialization. 88 87 89 88 ## Rust code style 90 89