···22222323## Project overview
24242525-Hydrant is an AT Protocol indexer built on the `fjall` LSM-tree engine. It supports both full-network indexing and efficient targeted indexing (filtered by DID), while maintaining full Firehose compatibility.
2525+Hydrant is an AT Protocol indexer built on the `fjall` database. It supports both full-network indexing and filtered indexing (eg. by DID).
26262727Key design goals:
2828- Ingestion via the `fjall` storage engine.
2929- Content-Addressable Storage (CAS) for IPLD blocks.
3030- Reliable backfill mechanism with buffered live-event replay.
3131- Efficient binary storage using MessagePack (`rmp-serde`).
3232-- Native integration with the `jacquard` suite of ATProto crates.
3232+- Uses `jacquard` suite of ATProto crates.
33333434## System architecture
35353636-Hydrant consists of several concurrent components:
3737-- **Ingestor**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
3838-- **Crawler**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode.
3939-- **Backfill worker**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected.
4040-- **API server**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a TAP-compatible JSON stream API via WebSockets.
4141-- **Persistence worker**: Manages periodic background flushes of the LSM-tree and cursor state.
3636+Hydrant consists of several components:
3737+- **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
3838+- **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently. Verifies signatures, updates repository state, detects gaps for backfill, and persists records.
3939+- **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode.
4040+- **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected.
4141+- **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a event stream API via WebSockets.
4242+- **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state.
42434344### Lazy event inflation
4545+4446To minimize latency in `apply_commit` and the backfill worker, events are stored in a compact `StoredEvent` format. The expansion into full TAP-compatible JSON (including fetching record content from the CAS and DAG-CBOR parsing) is performed lazily within the WebSocket stream handler.
45474648## General conventions
···5254- Prefer compile-time guarantees over runtime checks where possible.
53555456### Production-grade engineering
5555-- Use `miette` for rich, diagnostic-driven error reporting.
5757+- Use `miette` for diagnostic-driven error reporting.
5658- Implement exhaustive integration tests that simulate full backfill cycles.
5759- Adhere to lowercase comments and sentence case in documentation.
5860- Avoid unnecessary comments if the code is self-documenting.
···6062### Storage and serialization
6163- **State**: Use `rmp-serde` (MessagePack) for all internal state (`RepoState`, `ErrorState`, `StoredEvent`).
6264- **Blocks**: Store IPLD blocks as raw DAG-CBOR bytes in the CAS. This avoids expensive transcoding and allows direct serving of block content.
6363-- **Cursors**: Store cursors as plain UTF-8 strings for visibility and manual debugging.
6565+- **Cursors**: Store cursors as big-endian bytes (`u64`/`i64`).
6466- **Keyspaces**: Use the `keys.rs` module to maintain consistent composite key formats.
65676668## Database schema (keyspaces)
67696870Hydrant uses multiple `fjall` keyspaces:
6971- `repos`: Maps `{DID}` -> `RepoState` (MessagePack).
7070-- `records`: Maps `{DID}\x00{Collection}\x00{RKey}` -> `{CID}` (String).
7272+- `records`: Maps `{DID}|{Collection}|{RKey}` -> `{CID}` (String).
7173- `blocks`: Maps `{CID}` -> `Block Data` (Raw CBOR).
7274- `events`: Maps `{ID}` (u64) -> `StoredEvent` (MessagePack). This is the source for the JSON stream API.
7373-- `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (String).
7575+- `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (u64/i64 BE Bytes).
7476- `pending`: Index of DIDs awaiting backfill.
7575-- `errors`: Maps `{DID}` -> `ErrorState` (MessagePack) for retry logic.
7676-- `buffer`: Maps `{DID}\x00{SEQ}` -> `Buffered Commit` (MessagePack).
7777+- `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones.
7878+- `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes).
77797880## Safe commands
79818080-### Compilation and linting
8181-- `cargo check` - fast validation of changes.
8282-- `cargo clippy` - ensure idiomatic Rust code.
8383-8482### Testing
8583- `nu tests/repo_sync_integrity.nu` - Runs the full integration test suite using Nushell. This builds the binary, starts a temporary instance, performs a backfill against a real PDS, and verifies record integrity.
8684- `nu tests/stream_test.nu` - Tests WebSocket streaming functionality. Verifies both live event streaming during backfill and historical replay with cursor.
8785- `nu tests/authenticated_stream_test.nu` - Tests authenticated event streaming. Verifies that create, update, and delete actions on a real account are correctly streamed by Hydrant in the correct order. Requires `TEST_REPO` and `TEST_PASSWORD` in `.env`.
8686+- `nu tests/debug_endpoints.nu` - Tests debug/introspection endpoints (`/debug/iter`, `/debug/get`) and verifies DB content and serialization.
88878988## Rust code style
9089