···11# hydrant
2233+`hydrant` is an AT Protocol indexer built on the `fjall` database. it's meant to be a flexible indexer, supporting both full-network indexing and filtered indexing (e.g., by DID), also allowing querying with XRPC's like `com.atproto.sync.getRepo`, `com.atproto.repo.listRecords`, and so on, which should allow many more usecases compared to just providing an event stream.
44+35## configuration
4655-environment variables:
66-- `HYDRANT_DATABASE_PATH`: path to database folder (default: `./hydrant.db`)
77-- `HYDRANT_RELAY_HOST`: relay WebSocket URL (default: `wss://relay.fire.hose.cam`)
88-- `HYDRANT_PLC_URL`: base URL of the PLC directory (default: `https://plc.wtf`).
99-- `HYDRANT_FULL_NETWORK`: if set to `true`, the indexer will discover and index all repos it sees.
1010-- `HYDRANT_CURSOR_SAVE_INTERVAL`: how often to save the Firehose cursor (default: `10s`).
77+`hydrant` is configured via environment variables. all variables are prefixed with `HYDRANT_`.
88+99+| variable | default | description |
1010+| :--- | :--- | :--- |
1111+| `DATABASE_PATH` | `./hydrant.db` | path to the database folder. |
1212+| `LOG_LEVEL` | `info` | log level (e.g., `debug`, `info`, `warn`, `error`). |
1313+| `RELAY_HOST` | `wss://relay.fire.hose.cam` | websocket URL of the upstream firehose relay. |
1414+| `PLC_URL` | `https://plc.wtf` | base URL(s) of the PLC directory (comma-separated for multiple). |
1515+| `FULL_NETWORK` | `false` | if `true`, discovers and indexes all repositories in the network. |
1616+| `FIREHOSE_WORKERS` | `64` | number of concurrent workers for processing firehose events. |
1717+| `BACKFILL_CONCURRENCY_LIMIT` | `128` | maximum number of concurrent backfill tasks. |
1818+| `VERIFY_SIGNATURES` | `full` | signature verification level: `full`, `backfill-only`, or `none`. |
1919+| `CURSOR_SAVE_INTERVAL` | `10` | interval (in seconds) to save the firehose cursor. |
2020+| `REPO_FETCH_TIMEOUT` | `300` | timeout (in seconds) for fetching repositories. |
2121+| `CACHE_SIZE` | `256` | size of the database cache in MB. |
2222+| `IDENTITY_CACHE_SIZE` | `100000` | number of identity entries to cache. |
2323+| `API_PORT` | `3000` | port for the API server. |
2424+| `ENABLE_DEBUG` | `false` | enable debug endpoints. |
2525+| `DEBUG_PORT` | `3001` | port for debug endpoints (if enabled). |
2626+| `NO_LZ4_COMPRESSION` | `false` | disable lz4 compression for storage. |
2727+| `DISABLE_FIREHOSE` | `false` | disable firehose ingestion. |
2828+| `DISABLE_BACKFILL` | `false` | disable backfill processing. |
2929+3030+## api
3131+3232+### management
3333+3434+- `POST /repo/add`: register a DID, start backfilling, and subscribe to updates.
3535+ - body: `{ "dids": ["did:plc:..."] }`
3636+- `POST /repo/remove`: unregister a DID and delete all associated data.
3737+ - body: `{ "dids": ["did:plc:..."] }`
3838+3939+### data access (xrpc)
4040+4141+`hydrant` implements some AT Protocol XRPC endpoints for reading data:
4242+4343+- `com.atproto.repo.getRecord`: retrieve a single record by collection and rkey.
4444+- `com.atproto.repo.listRecords`: list records in a collection, with pagination.
4545+- `systems.gaze.hydrant.countRecords`: count records in a collection.
4646+4747+### event stream
4848+4949+- `GET /stream`: subscribe to the event stream.
5050+ - query parameters:
5151+ - `cursor` (optional): start streaming from a specific event ID.
5252+5353+### stats
5454+5555+- `GET /stats`: get aggregate counts of repos, records, events, and errors.
5656+5757+## vs `tap`
5858+5959+while [`tap`](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) is designed primarily as a firehose consumer and relay, `hydrant` is flexible, it allows you to directly query the database for records, and it also provides an ordered view of events, allowing the use of a cursor to fetch events from a specific point in time.
6060+6161+### stream behavior
6262+6363+the `WS /stream` (hydrant) and `WS /channel` (tap) endpoints have different designs:
6464+6565+| aspect | `tap` (`/channel`) | `hydrant` (`/stream`) |
6666+| :--- | :--- | :--- |
6767+| distribution | sharded work queue: events are load-balanced across connected clients. If 5 clients connect, each receives ~20% of events. | broadcast: every connected client receives a full copy of the event stream. If 5 clients connect, all 5 receive 100% of events. |
6868+| cursors | server-managed: clients ACK messages. The server tracks progress and redelivers unacked messages. | client-managed: client provides `?cursor=123`. The server streams from that point. |
6969+| backfill | integrated queue: backfill events are mixed into the live queue and prioritized by the server. | unified log: backfill simply inserts "historical" events (`live: false`) into the global event log. streaming is just reading this log sequentially. |
7070+| event types | `record`, `identity` (includes status) | `record`, `identity` (handle), `account` (status) |
7171+| persistence | **full**: all events are stored and replayable. | **hybrid**: `record` events are persisted/replayable. `identity`/`account` are ephemeral/live-only. |
+1-1
src/config.rs
···9595 .unwrap_or_else(|| Ok(vec![Url::parse("https://plc.wtf").unwrap()]))?;
96969797 let full_network = cfg!("FULL_NETWORK", false);
9898- let backfill_concurrency_limit = cfg!("BACKFILL_CONCURRENCY_LIMIT", 32usize);
9898+ let backfill_concurrency_limit = cfg!("BACKFILL_CONCURRENCY_LIMIT", 128usize);
9999 let cursor_save_interval = cfg!("CURSOR_SAVE_INTERVAL", 10, sec);
100100 let repo_fetch_timeout = cfg!("REPO_FETCH_TIMEOUT", 300, sec);
101101