at protocol indexer with flexible filtering, xrpc queries, and a cursor-backed event stream, built on fjall
at-protocol
atproto
indexer
rust
fjall
1# hydrant
2
3`hydrant` is an AT Protocol indexer built on the `fjall` database. it's meant to be a flexible indexer, supporting both full-network indexing and filtered indexing (e.g., by DID), also allowing querying with XRPC's like `com.atproto.sync.getRepo`, `com.atproto.repo.listRecords`, and so on, which should allow many more usecases compared to just providing an event stream.
4
5## configuration
6
7`hydrant` is configured via environment variables. all variables are prefixed with `HYDRANT_`.
8
9| variable | default | description |
10| :--- | :--- | :--- |
11| `DATABASE_PATH` | `./hydrant.db` | path to the database folder. |
12| `LOG_LEVEL` | `info` | log level (e.g., `debug`, `info`, `warn`, `error`). |
13| `RELAY_HOST` | `wss://relay.fire.hose.cam` | websocket URL of the upstream firehose relay. |
14| `PLC_URL` | `https://plc.wtf` | base URL(s) of the PLC directory (comma-separated for multiple). |
15| `FULL_NETWORK` | `false` | if `true`, discovers and indexes all repositories in the network. |
16| `FIREHOSE_WORKERS` | `64` | number of concurrent workers for processing firehose events. |
17| `BACKFILL_CONCURRENCY_LIMIT` | `128` | maximum number of concurrent backfill tasks. |
18| `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none`. |
19| `CURSOR_SAVE_INTERVAL` | `10` | interval (in seconds) to save the firehose cursor. |
20| `REPO_FETCH_TIMEOUT` | `300` | timeout (in seconds) for fetching repositories. |
21| `CACHE_SIZE` | `256` | size of the database cache in MB. |
22| `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache. |
23| `API_PORT` | `3000` | port for the API server. |
24| `ENABLE_DEBUG` | `false` | enable debug endpoints. |
25| `DEBUG_PORT` | `3001` | port for debug endpoints (if enabled). |
26| `NO_LZ4_COMPRESSION` | `false` | disable lz4 compression for storage. |
27| `DISABLE_FIREHOSE` | `false` | disable firehose ingestion. |
28| `DISABLE_BACKFILL` | `false` | disable backfill processing. |
29
30## api
31
32### management
33
34- `POST /repo/add`: register a DID, start backfilling, and subscribe to updates.
35 - body: `{ "dids": ["did:plc:..."] }`
36- `POST /repo/remove`: unregister a DID and delete all associated data.
37 - body: `{ "dids": ["did:plc:..."] }`
38
39### data access (xrpc)
40
41`hydrant` implements some AT Protocol XRPC endpoints for reading data:
42
43- `com.atproto.repo.getRecord`: retrieve a single record by collection and rkey.
44- `com.atproto.repo.listRecords`: list records in a collection, with pagination.
45- `systems.gaze.hydrant.countRecords`: count records in a collection.
46
47### event stream
48
49- `GET /stream`: subscribe to the event stream.
50 - query parameters:
51 - `cursor` (optional): start streaming from a specific event ID.
52
53### stats
54
55- `GET /stats`: get aggregate counts of repos, records, events, and errors.
56
57## vs `tap`
58
59while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is designed primarily as a firehose consumer and relay, `hydrant` is flexible, it allows you to directly query the database for records, and it also provides an ordered view of events, allowing the use of a cursor to fetch events from a specific point in time.
60
61### stream behavior
62
63the `WS /stream` (hydrant) and `WS /channel` (tap) endpoints have different designs:
64
65| aspect | `tap` (`/channel`) | `hydrant` (`/stream`) |
66| :--- | :--- | :--- |
67| distribution | sharded work queue: events are load-balanced across connected clients. If 5 clients connect, each receives ~20% of events. | broadcast: every connected client receives a full copy of the event stream. If 5 clients connect, all 5 receive 100% of events. |
68| cursors | server-managed: clients ACK messages. The server tracks progress and redelivers unacked messages. | client-managed: client provides `?cursor=123`. The server streams from that point. |
69| backfill | integrated queue: backfill events are mixed into the live queue and prioritized by the server. | unified log: backfill simply inserts "historical" events (`live: false`) into the global event log. streaming is just reading this log sequentially. |
70| event types | `record`, `identity` (includes status) | `record`, `identity` (handle), `account` (status) |
71| persistence | **full**: all events are stored and replayable. | **hybrid**: `record` events are persisted/replayable. `identity`/`account` are ephemeral/live-only. |