at protocol indexer with flexible filtering, xrpc queries, and a cursor-backed event stream, built on fjall
at-protocol atproto indexer rust fjall
Rust 83.4%
Nushell 15.9%
Nix 0.7%
73 1 0

Clone this repository

https://tangled.org/ptr.pet/hydrant https://tangled.org/did:plc:dfl62fgb7wtjj3fcbb72naae/hydrant
git@knot.gaze.systems:ptr.pet/hydrant git@knot.gaze.systems:did:plc:dfl62fgb7wtjj3fcbb72naae/hydrant

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

hydrant#

hydrant is an AT Protocol indexer built on the fjall database. it's meant to be a flexible indexer, supporting both full-network indexing and filtered indexing (e.g., by DID), also allowing querying with XRPC's like com.atproto.sync.getRepo, com.atproto.repo.listRecords, and so on, which should allow many more usecases compared to just providing an event stream.

configuration#

hydrant is configured via environment variables. all variables are prefixed with HYDRANT_.

variable default description
DATABASE_PATH ./hydrant.db path to the database folder.
LOG_LEVEL info log level (e.g., debug, info, warn, error).
RELAY_HOST wss://relay.fire.hose.cam websocket URL of the upstream firehose relay.
PLC_URL https://plc.wtf base URL(s) of the PLC directory (comma-separated for multiple).
FULL_NETWORK false if true, discovers and indexes all repositories in the network.
FIREHOSE_WORKERS 64 number of concurrent workers for processing firehose events.
BACKFILL_CONCURRENCY_LIMIT 128 maximum number of concurrent backfill tasks.
VERIFY_SIGNATURES full signature verification level: full, backfill-only, or none.
CURSOR_SAVE_INTERVAL 10 interval (in seconds) to save the firehose cursor.
REPO_FETCH_TIMEOUT 300 timeout (in seconds) for fetching repositories.
CACHE_SIZE 256 size of the database cache in MB.
IDENTITY_CACHE_SIZE 100000 number of identity entries to cache.
API_PORT 3000 port for the API server.
ENABLE_DEBUG false enable debug endpoints.
DEBUG_PORT 3001 port for debug endpoints (if enabled).
NO_LZ4_COMPRESSION false disable lz4 compression for storage.
DISABLE_FIREHOSE false disable firehose ingestion.
DISABLE_BACKFILL false disable backfill processing.

api#

management#

  • POST /repo/add: register a DID, start backfilling, and subscribe to updates.
    • body: { "dids": ["did:plc:..."] }
  • POST /repo/remove: unregister a DID and delete all associated data.
    • body: { "dids": ["did:plc:..."] }

data access (xrpc)#

hydrant implements some AT Protocol XRPC endpoints for reading data:

  • com.atproto.repo.getRecord: retrieve a single record by collection and rkey.
  • com.atproto.repo.listRecords: list records in a collection, with pagination.
  • systems.gaze.hydrant.countRecords: count records in a collection.

event stream#

  • GET /stream: subscribe to the event stream.
    • query parameters:
      • cursor (optional): start streaming from a specific event ID.

stats#

  • GET /stats: get aggregate counts of repos, records, events, and errors.

vs tap#

while tap is designed primarily as a firehose consumer and relay, hydrant is flexible, it allows you to directly query the database for records, and it also provides an ordered view of events, allowing the use of a cursor to fetch events from a specific point in time.

stream behavior#

the WS /stream (hydrant) and WS /channel (tap) endpoints have different designs:

aspect tap (/channel) hydrant (/stream)
distribution sharded work queue: events are load-balanced across connected clients. If 5 clients connect, each receives ~20% of events. broadcast: every connected client receives a full copy of the event stream. If 5 clients connect, all 5 receive 100% of events.
cursors server-managed: clients ACK messages. The server tracks progress and redelivers unacked messages. client-managed: client provides ?cursor=123. The server streams from that point.
backfill integrated queue: backfill events are mixed into the live queue and prioritized by the server. unified log: backfill simply inserts "historical" events (live: false) into the global event log. streaming is just reading this log sequentially.
event types record, identity (includes status) record, identity (handle), account (status)
persistence full: all events are stored and replayable. hybrid: record events are persisted/replayable. identity/account are ephemeral/live-only.