kind of like tap but different and in rust

[docs] update readme and agents for filter api

ptr.pet 68df12bf a0a9359a

verified
+76 -10
+2 -1
AGENTS.md
··· 39 39 - **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode. 40 40 - **[`hydrant::resolver`]**: Manages DID resolution and key lookups. Supports multiple PLC directory sources with failover and caching. 41 41 - **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files. Uses LIFO prioritization and adaptive concurrency to manage backfill load efficiently. 42 - - **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a event stream API via WebSockets. 42 + - **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a WebSocket event stream and a filter management API (`GET`/`PATCH /filter`) for configuring indexing mode, DID lists, signals, and collection patterns. 43 43 - **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state. 44 44 45 45 ### Lazy event inflation ··· 82 82 - `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones. 83 83 - `resync_buffer`: Maps `{DID}|{Rev}` -> `Commit` (MessagePack). Used to buffer live events during backfill. 84 84 - `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes). 85 + - `filter`: Stores filter config: mode key `m` -> `FilterMode` (MessagePack), and set entries for DIDs (`d|{DID}`), signals (`s|{NSID}`), collections (`c|{NSID}`), and excludes (`x|{DID}`) -> empty value. 85 86 86 87 ## Safe commands 87 88
+74 -9
README.md
··· 1 1 # hydrant 2 2 3 - `hydrant` is an AT Protocol indexer built on the `fjall` database. it's meant to be a flexible indexer, supporting both full-network indexing and filtered indexing (e.g., by DID), also allowing querying with XRPC's like `com.atproto.sync.getRepo`, `com.atproto.repo.listRecords`, and so on, which should allow many more usecases compared to just providing an event stream. 3 + `hydrant` is an AT Protocol indexer built on the `fjall` database. it's meant to be a flexible indexer, supporting both full-network indexing and filtered indexing (e.g., by DID), also allowing querying with XRPCs and providing an ordered event stream with cursor support. 4 4 5 5 ## configuration 6 6 ··· 40 40 41 41 ### management 42 42 43 - - `POST /repo/add`: register a DID, start backfilling, and subscribe to updates. 44 - - body: `{ "dids": ["did:plc:..."] }` 45 - - `POST /repo/remove`: unregister a DID and delete all associated data. 46 - - body: `{ "dids": ["did:plc:..."] }` 43 + - `GET /filter`: get the current filter configuration. 44 + - `PATCH /filter`: update the filter configuration. 45 + 46 + #### filter mode 47 + 48 + the `mode` field controls what gets indexed: 49 + 50 + | mode | behaviour | 51 + | :--- | :--- | 52 + | `dids` | only index repositories explicitly listed in `dids`. new accounts seen on the firehose are ignored unless they are in the list. | 53 + | `signal` | like `dids`, but also auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. | 54 + | `full` | index the entire network. `dids` and `signals` are ignored for discovery, but `excludes` and `collections` still apply. | 55 + 56 + #### fields 57 + 58 + | field | type | description | 59 + | :--- | :--- | :--- | 60 + | `mode` | `"dids"` \| `"signal"` \| `"full"` | indexing mode (see above). | 61 + | `dids` | set update | set of DIDs to explicitly track. in `dids` and `signal` modes, always processed regardless of signal matching. adding an untracked DID enqueues a backfill. | 62 + | `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `signal` mode. | 63 + | `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. | 64 + | `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. | 65 + 66 + #### set updates 67 + 68 + each set field accepts one of two forms: 69 + 70 + - **replace**: an array replaces the entire set — `["did:plc:abc", "did:plc:xyz"]` 71 + - **patch**: an object maps items to `true` (add) or `false` (remove) — `{"did:plc:abc": true, "did:plc:xyz": false}` 72 + 73 + #### NSID patterns 74 + 75 + `signals` and `collections` support an optional `.*` suffix to match an entire namespace: 76 + 77 + - `app.bsky.feed.post` — exact match only 78 + - `app.bsky.feed.*` — matches any collection under `app.bsky.feed` 47 79 48 80 ### data access (xrpc) 49 81 50 - `hydrant` implements some AT Protocol XRPC endpoints for reading data: 82 + `hydrant` implements the following XRPC endpoints under `/xrpc/`: 83 + 84 + #### `com.atproto.repo.getRecord` 85 + 86 + retrieve a single record by its AT-URI components. 87 + 88 + | param | required | description | 89 + | :--- | :--- | :--- | 90 + | `repo` | yes | DID or handle of the repository. | 91 + | `collection` | yes | NSID of the collection. | 92 + | `rkey` | yes | record key. | 93 + 94 + returns the record value, its CID, and its AT-URI. responds with `RecordNotFound` if not present. 95 + 96 + #### `com.atproto.repo.listRecords` 97 + 98 + list records in a collection, newest-first by default. 99 + 100 + | param | required | description | 101 + | :--- | :--- | :--- | 102 + | `repo` | yes | DID or handle of the repository. | 103 + | `collection` | yes | NSID of the collection. | 104 + | `limit` | no | max records to return (default `50`, max `100`). | 105 + | `cursor` | no | opaque cursor for pagination (from a previous response). | 106 + | `reverse` | no | if `true`, iterates oldest-first. | 51 107 52 - - `com.atproto.repo.getRecord`: retrieve a single record by collection and rkey. 53 - - `com.atproto.repo.listRecords`: list records in a collection, with pagination. 54 - - `systems.gaze.hydrant.countRecords`: count records in a collection. 108 + returns `{ records, cursor }`. if `cursor` is present there are more results. 109 + 110 + #### `systems.gaze.hydrant.countRecords` 111 + 112 + return the total number of stored records in a collection. 113 + 114 + | param | required | description | 115 + | :--- | :--- | :--- | 116 + | `identifier` | yes | DID or handle of the repository. | 117 + | `collection` | yes | NSID of the collection. | 118 + 119 + returns `{ count }`. 55 120 56 121 ### event stream 57 122