···2223## Project overview
2425-Hydrant is an AT Protocol indexer built on the `fjall` LSM-tree engine. It supports both full-network indexing and efficient targeted indexing (filtered by DID), while maintaining full Firehose compatibility.
2627Key design goals:
28- Ingestion via the `fjall` storage engine.
29- Content-Addressable Storage (CAS) for IPLD blocks.
30- Reliable backfill mechanism with buffered live-event replay.
31- Efficient binary storage using MessagePack (`rmp-serde`).
32-- Native integration with the `jacquard` suite of ATProto crates.
3334## System architecture
3536-Hydrant consists of several concurrent components:
37-- **Ingestor**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
38-- **Crawler**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode.
39-- **Backfill worker**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected.
40-- **API server**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a TAP-compatible JSON stream API via WebSockets.
41-- **Persistence worker**: Manages periodic background flushes of the LSM-tree and cursor state.
04243### Lazy event inflation
044To minimize latency in `apply_commit` and the backfill worker, events are stored in a compact `StoredEvent` format. The expansion into full TAP-compatible JSON (including fetching record content from the CAS and DAG-CBOR parsing) is performed lazily within the WebSocket stream handler.
4546## General conventions
···52- Prefer compile-time guarantees over runtime checks where possible.
5354### Production-grade engineering
55-- Use `miette` for rich, diagnostic-driven error reporting.
56- Implement exhaustive integration tests that simulate full backfill cycles.
57- Adhere to lowercase comments and sentence case in documentation.
58- Avoid unnecessary comments if the code is self-documenting.
···60### Storage and serialization
61- **State**: Use `rmp-serde` (MessagePack) for all internal state (`RepoState`, `ErrorState`, `StoredEvent`).
62- **Blocks**: Store IPLD blocks as raw DAG-CBOR bytes in the CAS. This avoids expensive transcoding and allows direct serving of block content.
63-- **Cursors**: Store cursors as plain UTF-8 strings for visibility and manual debugging.
64- **Keyspaces**: Use the `keys.rs` module to maintain consistent composite key formats.
6566## Database schema (keyspaces)
6768Hydrant uses multiple `fjall` keyspaces:
69- `repos`: Maps `{DID}` -> `RepoState` (MessagePack).
70-- `records`: Maps `{DID}\x00{Collection}\x00{RKey}` -> `{CID}` (String).
71- `blocks`: Maps `{CID}` -> `Block Data` (Raw CBOR).
72- `events`: Maps `{ID}` (u64) -> `StoredEvent` (MessagePack). This is the source for the JSON stream API.
73-- `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (String).
74- `pending`: Index of DIDs awaiting backfill.
75-- `errors`: Maps `{DID}` -> `ErrorState` (MessagePack) for retry logic.
76-- `buffer`: Maps `{DID}\x00{SEQ}` -> `Buffered Commit` (MessagePack).
7778## Safe commands
7980-### Compilation and linting
81-- `cargo check` - fast validation of changes.
82-- `cargo clippy` - ensure idiomatic Rust code.
83-84### Testing
85- `nu tests/repo_sync_integrity.nu` - Runs the full integration test suite using Nushell. This builds the binary, starts a temporary instance, performs a backfill against a real PDS, and verifies record integrity.
86- `nu tests/stream_test.nu` - Tests WebSocket streaming functionality. Verifies both live event streaming during backfill and historical replay with cursor.
87- `nu tests/authenticated_stream_test.nu` - Tests authenticated event streaming. Verifies that create, update, and delete actions on a real account are correctly streamed by Hydrant in the correct order. Requires `TEST_REPO` and `TEST_PASSWORD` in `.env`.
08889## Rust code style
90
···2223## Project overview
2425+Hydrant is an AT Protocol indexer built on the `fjall` database. It supports both full-network indexing and filtered indexing (eg. by DID).
2627Key design goals:
28- Ingestion via the `fjall` storage engine.
29- Content-Addressable Storage (CAS) for IPLD blocks.
30- Reliable backfill mechanism with buffered live-event replay.
31- Efficient binary storage using MessagePack (`rmp-serde`).
32+- Uses `jacquard` suite of ATProto crates.
3334## System architecture
3536+Hydrant consists of several components:
37+- **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
38+- **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently. Verifies signatures, updates repository state, detects gaps for backfill, and persists records.
39+- **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode.
40+- **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files from PDS instances when a new repo is detected.
41+- **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a event stream API via WebSockets.
42+- **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state.
4344### Lazy event inflation
45+46To minimize latency in `apply_commit` and the backfill worker, events are stored in a compact `StoredEvent` format. The expansion into full TAP-compatible JSON (including fetching record content from the CAS and DAG-CBOR parsing) is performed lazily within the WebSocket stream handler.
4748## General conventions
···54- Prefer compile-time guarantees over runtime checks where possible.
5556### Production-grade engineering
57+- Use `miette` for diagnostic-driven error reporting.
58- Implement exhaustive integration tests that simulate full backfill cycles.
59- Adhere to lowercase comments and sentence case in documentation.
60- Avoid unnecessary comments if the code is self-documenting.
···62### Storage and serialization
63- **State**: Use `rmp-serde` (MessagePack) for all internal state (`RepoState`, `ErrorState`, `StoredEvent`).
64- **Blocks**: Store IPLD blocks as raw DAG-CBOR bytes in the CAS. This avoids expensive transcoding and allows direct serving of block content.
65+- **Cursors**: Store cursors as big-endian bytes (`u64`/`i64`).
66- **Keyspaces**: Use the `keys.rs` module to maintain consistent composite key formats.
6768## Database schema (keyspaces)
6970Hydrant uses multiple `fjall` keyspaces:
71- `repos`: Maps `{DID}` -> `RepoState` (MessagePack).
72+- `records`: Maps `{DID}|{Collection}|{RKey}` -> `{CID}` (String).
73- `blocks`: Maps `{CID}` -> `Block Data` (Raw CBOR).
74- `events`: Maps `{ID}` (u64) -> `StoredEvent` (MessagePack). This is the source for the JSON stream API.
75+- `cursors`: Maps `firehose_cursor` or `crawler_cursor` -> `Value` (u64/i64 BE Bytes).
76- `pending`: Index of DIDs awaiting backfill.
77+- `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones.
78+- `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes).
7980## Safe commands
81000082### Testing
83- `nu tests/repo_sync_integrity.nu` - Runs the full integration test suite using Nushell. This builds the binary, starts a temporary instance, performs a backfill against a real PDS, and verifies record integrity.
84- `nu tests/stream_test.nu` - Tests WebSocket streaming functionality. Verifies both live event streaming during backfill and historical replay with cursor.
85- `nu tests/authenticated_stream_test.nu` - Tests authenticated event streaming. Verifies that create, update, and delete actions on a real account are correctly streamed by Hydrant in the correct order. Requires `TEST_REPO` and `TEST_PASSWORD` in `.env`.
86+- `nu tests/debug_endpoints.nu` - Tests debug/introspection endpoints (`/debug/iter`, `/debug/get`) and verifies DB content and serialization.
8788## Rust code style
89