···36Hydrant consists of several components:
37- **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
38- **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently using sharded workers. Verifies signatures, updates repository state (handling account status events like deactivations), detects gaps for backfill, and persists records.
39-- **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories when in full-network mode.
40- **[`hydrant::resolver`]**: Manages DID resolution and key lookups. Supports multiple PLC directory sources with failover and caching.
41- **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files. Uses LIFO prioritization and adaptive concurrency to manage backfill load efficiently.
42-- **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a WebSocket event stream and a filter management API (`GET`/`PATCH /filter`) for configuring indexing mode, DID lists, signals, and collection patterns.
43-- **Persistence worker** (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state.
004445### Lazy event inflation
46···82- `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones.
83- `resync_buffer`: Maps `{DID}|{Rev}` -> `Commit` (MessagePack). Used to buffer live events during backfill.
84- `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes).
85-- `filter`: Stores filter config: mode key `m` -> `FilterMode` (MessagePack), and set entries for DIDs (`d|{DID}`), signals (`s|{NSID}`), collections (`c|{NSID}`), and excludes (`x|{DID}`) -> empty value.
8687## Safe commands
88
···36Hydrant consists of several components:
37- **[`hydrant::ingest::firehose`]**: Connects to an upstream Firehose (Relay) and filters events. It manages the transition between discovery and synchronization.
38- **[`hydrant::ingest::worker`]**: Processes buffered Firehose messages concurrently using sharded workers. Verifies signatures, updates repository state (handling account status events like deactivations), detects gaps for backfill, and persists records.
39+- **[`hydrant::crawler`]**: Periodically enumerates the network via `com.atproto.sync.listRepos` to discover new repositories. In `Full` mode it is enabled by default; in `Filter` mode it is opt-in via `HYDRANT_ENABLE_CRAWLER`.
40- **[`hydrant::resolver`]**: Manages DID resolution and key lookups. Supports multiple PLC directory sources with failover and caching.
41- **[`hydrant::backfill`]**: A dedicated worker that fetches full repository CAR files. Uses LIFO prioritization and adaptive concurrency to manage backfill load efficiently.
42+- **[`hydrant::api`]**: An Axum-based XRPC server implementing repository read methods (`getRecord`, `listRecords`) and system stats. It also provides a WebSocket event stream and management APIs:
43+ - `/filter` (`GET`/`PATCH`): Configure indexing mode, signals, and collection patterns.
44+ - `/repos` (`GET`/`PUT`/`DELETE`): Bulk repository management using NDJSON or JSON arrays.
45+- Persistence worker (in `src/main.rs`): Manages periodic background flushes of the LSM-tree and cursor state.
4647### Lazy event inflation
48···84- `resync`: Maps `{DID}` -> `ResyncState` (MessagePack) for retry logic/tombstones.
85- `resync_buffer`: Maps `{DID}|{Rev}` -> `Commit` (MessagePack). Used to buffer live events during backfill.
86- `counts`: Maps `k|{NAME}` or `r|{DID}|{COL}` -> `Count` (u64 BE Bytes).
87+- `filter`: Stores filter config. Handled by the `db::filter` module. Includes mode key `m` -> `FilterMode` (MessagePack), and set entries for signals (`s|{NSID}`), collections (`c|{NSID}`), and excludes (`x|{DID}`) -> empty value.
8889## Safe commands
90
+15-10
README.md
···40| `ENABLE_DEBUG` | `false` | enable debug endpoints. |
41| `DEBUG_PORT` | `3001` | port for debug endpoints (if enabled). |
42| `NO_LZ4_COMPRESSION` | `false` | disable lz4 compression for storage. |
43-| `DISABLE_FIREHOSE` | `false` | disable firehose ingestion. |
44-| `DISABLE_BACKFILL` | `false` | disable backfill processing. |
045| `DB_WORKER_THREADS` | `4` (`8` if full network) | database worker threads. |
46| `DB_MAX_JOURNALING_SIZE_MB` | `512` (`1024` if full network) | max database journaling size in MB. |
47| `DB_PENDING_MEMTABLE_SIZE_MB` | `64` (`192` if full network) | pending memtable size in MB. |
···6566| mode | behaviour |
67| :--- | :--- |
68-| `dids` | only index repositories explicitly listed in `dids`. new accounts seen on the firehose are ignored unless they are in the list. |
69-| `signal` | like `dids`, but also auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. |
70-| `full` | index the entire network. `dids` and `signals` are ignored for discovery, but `excludes` and `collections` still apply. |
7172#### fields
7374| field | type | description |
75| :--- | :--- | :--- |
76-| `mode` | `"dids"` \| `"signal"` \| `"full"` | indexing mode (see above). |
77-| `dids` | set update | set of DIDs to explicitly track. in `dids` and `signal` modes, always processed regardless of signal matching. adding an untracked DID enqueues a backfill. |
78-| `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `signal` mode. |
79| `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. |
80| `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. |
81···93- `app.bsky.feed.post` — exact match only
94- `app.bsky.feed.*` — matches any collection under `app.bsky.feed`
9500000096### data access (xrpc)
9798`hydrant` implements the following XRPC endpoints under `/xrpc/`:
99100#### `com.atproto.repo.getRecord`
101102-retrieve a single record by its AT-URI components.
103104| param | required | description |
105| :--- | :--- | :--- |
···107| `collection` | yes | NSID of the collection. |
108| `rkey` | yes | record key. |
109110-returns the record value, its CID, and its AT-URI. responds with `RecordNotFound` if not present.
111112#### `com.atproto.repo.listRecords`
113
···40| `ENABLE_DEBUG` | `false` | enable debug endpoints. |
41| `DEBUG_PORT` | `3001` | port for debug endpoints (if enabled). |
42| `NO_LZ4_COMPRESSION` | `false` | disable lz4 compression for storage. |
43+| `ENABLE_FIREHOSE` | `true` | whether to ingest relay subscriptions. |
44+| `ENABLE_BACKFILL` | `true` | whether to backfill from PDS instances. |
45+| `ENABLE_CRAWLER` | `false` (if Filter), `true` (if Full) | whether to actively query the network for unknown repositories. |
46| `DB_WORKER_THREADS` | `4` (`8` if full network) | database worker threads. |
47| `DB_MAX_JOURNALING_SIZE_MB` | `512` (`1024` if full network) | max database journaling size in MB. |
48| `DB_PENDING_MEMTABLE_SIZE_MB` | `64` (`192` if full network) | pending memtable size in MB. |
···6667| mode | behaviour |
68| :--- | :--- |
69+| `filter` | auto-discovers and backfills any account whose firehose commit touches a collection matching one of the `signals` patterns. you can also explicitly track individual repositories via the `/repos` endpoint regardless of matching signals. |
70+| `full` | index the entire network. `signals` are ignored for discovery, but `excludes` and `collections` still apply. |
07172#### fields
7374| field | type | description |
75| :--- | :--- | :--- |
76+| `mode` | `"filter"` \| `"full"` | indexing mode (see above). |
77+| `signals` | set update | NSID patterns (e.g. `app.bsky.feed.post` or `app.bsky.*`) that trigger auto-discovery in `filter` mode. |
078| `collections` | set update | NSID patterns used to filter which records are stored. if empty, all collections are stored. applies in all modes. |
79| `excludes` | set update | set of DIDs to always skip, regardless of mode. checked before any other filter logic. |
80···92- `app.bsky.feed.post` — exact match only
93- `app.bsky.feed.*` — matches any collection under `app.bsky.feed`
9495+### repository management
96+97+- `GET /repos`: get an NDJSON stream of all repositories and their sync status.
98+- `PUT /repos`: explicitly track repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same).
99+- `DELETE /repos`: untrack repositories. accepts an NDJSON body of `{"did": "..."}` (or JSON array of the same). optionally include `"deleteData": true` to also purge the repository from the database.
100+101### data access (xrpc)
102103`hydrant` implements the following XRPC endpoints under `/xrpc/`:
104105#### `com.atproto.repo.getRecord`
106107+retrieve a single record by its AT URI components.
108109| param | required | description |
110| :--- | :--- | :--- |
···112| `collection` | yes | NSID of the collection. |
113| `rkey` | yes | record key. |
114115+returns the record value, its CID, and its AT URI. responds with `RecordNotFound` if not present.
116117#### `com.atproto.repo.listRecords`
118
···89mod debug;
10pub mod filter;
011pub mod stats;
12mod stream;
13pub mod xrpc;
···21 .route("/stream", get(stream::handle_stream))
22 .merge(xrpc::router())
23 .merge(filter::router())
024 .with_state(state)
25 .layer(TraceLayer::new_for_http())
26 .layer(CorsLayer::permissive());
···89mod debug;
10pub mod filter;
11+pub mod repos; // Added this line
12pub mod stats;
13mod stream;
14pub mod xrpc;
···22 .route("/stream", get(stream::handle_stream))
23 .merge(xrpc::router())
24 .merge(filter::router())
25+ .merge(repos::router()) // Added this line
26 .with_state(state)
27 .layer(TraceLayer::new_for_http())
28 .layer(CorsLayer::permissive());
···001use std::sync::Arc;
23-use arc_swap::ArcSwap;
4-use fjall::Keyspace;
5-use miette::{IntoDiagnostic, Result};
6-use serde::{Deserialize, Serialize};
7-use smol_str::SmolStr;
89-pub const MODE_KEY: &[u8] = b"m";
10-pub const DID_PREFIX: u8 = b'd';
11-pub const SIGNAL_PREFIX: u8 = b's';
12-pub const COLLECTION_PREFIX: u8 = b'c';
13-pub const EXCLUDE_PREFIX: u8 = b'x';
14-pub const SEP: u8 = b'|';
0001516#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
17#[serde(rename_all = "snake_case")]
18pub enum FilterMode {
19- Dids = 0,
20- Signal = 1,
21 Full = 2,
22}
23···39 }
40 }
4142- pub fn load(ks: &Keyspace) -> Result<Self> {
43- let mode = ks
44- .get(MODE_KEY)
45- .into_diagnostic()?
46- .map(|v| rmp_serde::from_slice(&v).into_diagnostic())
47- .transpose()?
48- .unwrap_or(FilterMode::Dids);
49-50- let mut config = Self::new(mode);
51-52- let signal_prefix = [SIGNAL_PREFIX, SEP];
53- for guard in ks.prefix(signal_prefix) {
54- let (k, _) = guard.into_inner().into_diagnostic()?;
55- let val = std::str::from_utf8(&k[signal_prefix.len()..]).into_diagnostic()?;
56- config.signals.push(SmolStr::new(val));
57- }
58-59- let col_prefix = [COLLECTION_PREFIX, SEP];
60- for guard in ks.prefix(col_prefix) {
61- let (k, _) = guard.into_inner().into_diagnostic()?;
62- let val = std::str::from_utf8(&k[col_prefix.len()..]).into_diagnostic()?;
63- config.collections.push(SmolStr::new(val));
64- }
65-66- Ok(config)
67- }
68-69 /// returns true if the collection matches the content filter.
70 /// if collections is empty, all collections match.
71 pub fn matches_collection(&self, collection: &str) -> bool {
···88 collection == pattern
89 }
90}
91-92-pub type FilterHandle = Arc<ArcSwap<FilterConfig>>;
93-94-pub fn new_handle(config: FilterConfig) -> FilterHandle {
95- Arc::new(ArcSwap::new(Arc::new(config)))
96-}
97-98-/// apply a bool patch or set replacement for a single set update.
99-#[derive(Debug, Deserialize)]
100-#[serde(untagged)]
101-pub enum SetUpdate {
102- /// replace the entire set with this list
103- Set(Vec<String>),
104- /// patch: true = add, false = remove
105- Patch(std::collections::HashMap<String, bool>),
106-}
107-108-pub fn filter_key(prefix: u8, val: &str) -> Vec<u8> {
109- let mut key = Vec::with_capacity(2 + val.len());
110- key.push(prefix);
111- key.push(SEP);
112- key.extend_from_slice(val.as_bytes());
113- key
114-}
···1+use serde::{Deserialize, Serialize};
2+use smol_str::SmolStr;
3use std::sync::Arc;
45+pub type FilterHandle = Arc<arc_swap::ArcSwap<FilterConfig>>;
6+7+pub fn new_handle(config: FilterConfig) -> FilterHandle {
8+ Arc::new(arc_swap::ArcSwap::new(Arc::new(config)))
9+}
1011+/// apply a bool patch or set replacement for a single set update.
12+#[derive(Debug, Clone, Serialize, Deserialize)]
13+#[serde(untagged)]
14+pub enum SetUpdate {
15+ /// replace the entire set with this list
16+ Set(Vec<String>),
17+ /// patch: true = add, false = remove
18+ Patch(std::collections::HashMap<String, bool>),
19+}
2021#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
22#[serde(rename_all = "snake_case")]
23pub enum FilterMode {
24+ Filter = 0,
025 Full = 2,
26}
27···43 }
44 }
4500000000000000000000000000046 /// returns true if the collection matches the content filter.
47 /// if collections is empty, all collections match.
48 pub fn matches_collection(&self, collection: &str) -> bool {
···65 collection == pattern
66 }
67}
000000000000000000000000
+22-22
src/ingest/firehose.rs
···1-use crate::db::{self, Db, keys};
2use crate::filter::{FilterHandle, FilterMode};
3use crate::ingest::{BufferTx, IngestMessage};
4use crate::state::AppState;
···118 async fn should_process(&self, did: &Did<'_>) -> Result<bool> {
119 let filter = self.filter.load();
120121- let excl_key = crate::filter::filter_key(crate::filter::EXCLUDE_PREFIX, did.as_str());
0122 if self
123 .state
124 .db
···131132 match filter.mode {
133 FilterMode::Full => Ok(true),
134- FilterMode::Dids | FilterMode::Signal => {
135- let did_key = crate::filter::filter_key(crate::filter::DID_PREFIX, did.as_str());
136- if self
137- .state
138- .db
139- .filter
140- .contains_key(&did_key)
141- .into_diagnostic()?
142- {
143- debug!("{did} is in DID allowlist, processing");
144- return Ok(true);
00145 }
146- let known =
147- Db::contains_key(self.state.db.repos.clone(), keys::repo_key(did)).await?;
148- if known {
149- debug!("{did} is a known repo, processing");
150 } else {
151- debug!(
152- "{did} is unknown — passing to worker for signal check (mode={:?})",
153- filter.mode
154- );
155 }
156- Ok(known || filter.mode == FilterMode::Signal)
157 }
158 }
159 }
···1+use crate::db;
2use crate::filter::{FilterHandle, FilterMode};
3use crate::ingest::{BufferTx, IngestMessage};
4use crate::state::AppState;
···118 async fn should_process(&self, did: &Did<'_>) -> Result<bool> {
119 let filter = self.filter.load();
120121+ let excl_key =
122+ crate::db::filter::filter_key(crate::db::filter::EXCLUDE_PREFIX, did.as_str());
123 if self
124 .state
125 .db
···132133 match filter.mode {
134 FilterMode::Full => Ok(true),
135+ FilterMode::Filter => {
136+ let repo_key = crate::db::keys::repo_key(did);
137+ if let Some(state_bytes) = self.state.db.repos.get(&repo_key).into_diagnostic()? {
138+ let repo_state: crate::types::RepoState =
139+ rmp_serde::from_slice(&state_bytes).into_diagnostic()?;
140+141+ if repo_state.tracked {
142+ debug!("{did} is a tracked repo, processing");
143+ return Ok(true);
144+ } else {
145+ debug!("{did} is known but explicitly untracked, skipping");
146+ return Ok(false);
147+ }
148 }
149+150+ if !filter.signals.is_empty() {
151+ debug!("{did} is unknown — passing to worker for signal check");
152+ Ok(true)
153 } else {
154+ debug!("{did} is unknown and no signals configured, skipping");
155+ Ok(false)
00156 }
0157 }
158 }
159 }
+7-2
src/ingest/worker.rs
···380 match &account.status {
381 Some(AccountStatus::Deleted) => {
382 debug!("account {did} deleted, wiping data");
383- ops::delete_repo(ctx.batch, &ctx.state.db, did, repo_state)?;
384 return Ok(RepoProcessResult::Deleted);
385 }
386 status => {
···530 let Some(state_bytes) = ctx.state.db.repos.get(&repo_key).into_diagnostic()? else {
531 let filter = ctx.state.filter.load();
532533- if filter.mode == FilterMode::Signal {
534 let commit = match msg {
535 SubscribeReposMessage::Commit(c) => c,
536 _ => return Ok(RepoProcessResult::Syncing(None)),
···582 return Ok(RepoProcessResult::Syncing(None));
583 };
584 let mut repo_state = crate::db::deser_repo_state(&state_bytes)?.into_static();
00000585586 // if we are backfilling or it is new, DON'T mark it as synced yet
587 // the backfill worker will do that when it finishes
···380 match &account.status {
381 Some(AccountStatus::Deleted) => {
382 debug!("account {did} deleted, wiping data");
383+ crate::ops::delete_repo(ctx.batch, &ctx.state.db, did, &repo_state)?;
384 return Ok(RepoProcessResult::Deleted);
385 }
386 status => {
···530 let Some(state_bytes) = ctx.state.db.repos.get(&repo_key).into_diagnostic()? else {
531 let filter = ctx.state.filter.load();
532533+ if filter.mode == FilterMode::Filter && !filter.signals.is_empty() {
534 let commit = match msg {
535 SubscribeReposMessage::Commit(c) => c,
536 _ => return Ok(RepoProcessResult::Syncing(None)),
···582 return Ok(RepoProcessResult::Syncing(None));
583 };
584 let mut repo_state = crate::db::deser_repo_state(&state_bytes)?.into_static();
585+586+ if !repo_state.tracked {
587+ debug!("ignoring active status for {did} as it is explicitly untracked");
588+ return Ok(RepoProcessResult::Syncing(None));
589+ }
590591 // if we are backfilling or it is new, DON'T mark it as synced yet
592 // the backfill worker will do that when it finishes
+32-20
src/main.rs
···1-use futures::{FutureExt, TryFutureExt, future::BoxFuture};
2use hydrant::config::{Config, SignatureVerification};
3-use hydrant::crawler::Crawler;
4use hydrant::db::{self, set_firehose_cursor};
5use hydrant::ingest::firehose::FirehoseIngestor;
6use hydrant::state::AppState;
···30 let filter_ks = state.db.filter.clone();
31 let inner = state.db.inner.clone();
32 tokio::task::spawn_blocking(move || {
33- use hydrant::filter::{FilterMode, MODE_KEY};
034 let mut batch = inner.batch();
35 batch.insert(
36 &filter_ks,
···49 let (buffer_tx, buffer_rx) = mpsc::unbounded_channel();
50 let state = Arc::new(state);
5152- if !cfg.disable_backfill {
53 tokio::spawn({
54 let state = state.clone();
55 let timeout = cfg.repo_fetch_timeout;
···144 }
145 });
146147- if let hydrant::filter::FilterMode::Full | hydrant::filter::FilterMode::Signal =
148- state.filter.load().mode
149- {
150- tokio::spawn(
151- Crawler::new(
152- state.clone(),
153- cfg.relay_host.clone(),
154- cfg.crawler_max_pending_repos,
155- cfg.crawler_resume_pending_repos,
156- )
157- .run()
158- .inspect_err(|e| {
159- error!("crawler died: {e}");
0000000000160 db::check_poisoned_report(&e);
161- }),
162- );
00163 }
164165- let mut tasks = if !cfg.disable_firehose {
166 let firehose_worker = std::thread::spawn({
167 let state = state.clone();
168 let handle = tokio::runtime::Handle::current();
···1+use futures::{FutureExt, future::BoxFuture};
2use hydrant::config::{Config, SignatureVerification};
03use hydrant::db::{self, set_firehose_cursor};
4use hydrant::ingest::firehose::FirehoseIngestor;
5use hydrant::state::AppState;
···29 let filter_ks = state.db.filter.clone();
30 let inner = state.db.inner.clone();
31 tokio::task::spawn_blocking(move || {
32+ use hydrant::db::filter::MODE_KEY;
33+ use hydrant::filter::FilterMode;
34 let mut batch = inner.batch();
35 batch.insert(
36 &filter_ks,
···49 let (buffer_tx, buffer_rx) = mpsc::unbounded_channel();
50 let state = Arc::new(state);
5152+ if cfg.enable_backfill {
53 tokio::spawn({
54 let state = state.clone();
55 let timeout = cfg.repo_fetch_timeout;
···144 }
145 });
146147+ info!("starting crawler ({:?})", state.filter.load().mode);
148+ let state_clone = state.clone();
149+ let relay_host_clone = cfg.relay_host.clone();
150+ let crawler_max_pending = cfg.crawler_max_pending_repos;
151+ let crawler_resume_pending = cfg.crawler_resume_pending_repos;
152+153+ let should_run_crawler = match cfg.enable_crawler {
154+ Some(true) => true,
155+ Some(false) => false,
156+ None => state.filter.load().mode == hydrant::filter::FilterMode::Full,
157+ };
158+159+ if should_run_crawler {
160+ tokio::spawn(async move {
161+ // the crawler is responsible for finding new repos
162+ let crawler = hydrant::crawler::Crawler::new(
163+ state_clone,
164+ relay_host_clone,
165+ crawler_max_pending,
166+ crawler_resume_pending,
167+ );
168+ if let Err(e) = crawler.run().await {
169+ error!("crawler error: {e}");
170 db::check_poisoned_report(&e);
171+ }
172+ });
173+ } else {
174+ info!("crawler disabled by config or filter mode");
175 }
176177+ let mut tasks = if cfg.enable_firehose {
178 let firehose_worker = std::thread::spawn({
179 let state = state.clone();
180 let handle = tokio::runtime::Handle::current();
···42 # 4. add repo to hydrant (backfill trigger)
43 print $"adding repo ($did) to tracking..."
44 try {
45+ http put -t application/json $"($url)/repos" [ { did: ($did) } ]
46 } catch {
47 print "warning: failed to add repo (might already be tracked), continuing..."
48 }
+14-13
tests/common.nu
···52# build the hydrant binary
53export def build-hydrant [] {
54 print "building hydrant..."
55- cargo build --release --quiet
56 "target/release/hydrant"
57}
58···61 let log_file = $"($db_path)/hydrant.log"
62 print $"starting hydrant - logs at ($log_file)..."
6364- let pid = (
65- with-env {
66- HYDRANT_DATABASE_PATH: ($db_path),
67- HYDRANT_FULL_NETWORK: "false",
68- HYDRANT_API_PORT: ($port | into string),
69- HYDRANT_ENABLE_DEBUG: "true",
70- HYDRANT_DEBUG_PORT: ($port + 1 | into string),
71- HYDRANT_LOG_LEVEL: "debug"
72- } {
73- sh -c $"($binary) >($log_file) 2>&1 & echo $!" | str trim | into int
74- }
75- )
07677 print $"hydrant started with pid: ($pid)"
78 { pid: $pid, log: $log_file }
···52# build the hydrant binary
53export def build-hydrant [] {
54 print "building hydrant..."
55+ cargo build --release
56 "target/release/hydrant"
57}
58···61 let log_file = $"($db_path)/hydrant.log"
62 print $"starting hydrant - logs at ($log_file)..."
6364+ let hydrant_vars = ($env | transpose k v | where k =~ "HYDRANT_" | reduce -f {} { |it, acc| $acc | upsert $it.k $it.v })
65+ let env_vars = {
66+ HYDRANT_DATABASE_PATH: ($db_path),
67+ HYDRANT_FULL_NETWORK: "false",
68+ HYDRANT_API_PORT: ($port | into string),
69+ HYDRANT_ENABLE_DEBUG: "true",
70+ HYDRANT_DEBUG_PORT: ($port + 1 | into string),
71+ HYDRANT_LOG_LEVEL: "debug"
72+ } | merge $hydrant_vars
73+74+ let pid = (with-env $env_vars {
75+ sh -c $"($binary) >($log_file) 2>&1 & echo $!" | str trim | into int
76+ })
7778 print $"hydrant started with pid: ($pid)"
79 { pid: $pid, log: $log_file }
+3-2
tests/debug_endpoints.nu
···18 if (wait-for-api $url) {
19 # Trigger backfill to populate some data
20 print $"adding repo ($did) to tracking..."
21- http patch -t application/json $"($url)/filter" { dids: { ($did): true } }
2223 if (wait-for-backfill $url) {
24 print "backfill complete, testing debug endpoints"
···4647 # 2. Test /debug/get with that key (sent as string)
48 print "testing /debug/get"
49- let get_res = http get $"($debug_url)/debug/get?partition=records&key=($key_str)"
05051 if $get_res.value != $value_cid {
52 print $"FAILED: /debug/get returned different value. expected: ($value_cid), got: ($get_res.value)"
···18 if (wait-for-api $url) {
19 # Trigger backfill to populate some data
20 print $"adding repo ($did) to tracking..."
21+ http put -t application/json $"($url)/repos" [ { did: ($did) } ]
2223 if (wait-for-backfill $url) {
24 print "backfill complete, testing debug endpoints"
···4647 # 2. Test /debug/get with that key (sent as string)
48 print "testing /debug/get"
49+ let encoded_key = ($key_str | url encode)
50+ let get_res = http get $"($debug_url)/debug/get?partition=records&key=($encoded_key)"
5152 if $get_res.value != $value_cid {
53 print $"FAILED: /debug/get returned different value. expected: ($value_cid), got: ($get_res.value)"
+1-1
tests/repo_sync_integrity.nu
···112 if (wait-for-api $url) {
113 # track the repo via API
114 print $"adding repo ($did) to tracking..."
115- http patch -t application/json $"($url)/filter" { dids: { ($did): true } }
116117 if (wait-for-backfill $url) {
118 # Run both consistency checks
···112 if (wait-for-api $url) {
113 # track the repo via API
114 print $"adding repo ($did) to tracking..."
115+ http put -t application/json $"($url)/repos" [ { did: ($did) } ]
116117 if (wait-for-backfill $url) {
118 # Run both consistency checks
+19-15
tests/signal_filter_test.nu
···11 exit 1
12 }
1314- let port = 3007
15 let url = $"http://localhost:($port)"
16 let db_path = (mktemp -d -t hydrant_signal_test.XXXXXX)
17- let collection = "app.bsky.feed.post"
001819 print $"database path: ($db_path)"
20···26 print "authenticated"
2728 let binary = build-hydrant
029 let instance = start-hydrant $binary $db_path $port
3031 mut test_passed = false
3233 if (wait-for-api $url) {
34- # configure signal mode: index app.bsky.feed.post from anyone on the network
35- print "configuring signal mode..."
36 http patch -t application/json $"($url)/filter" {
37- mode: "signal",
38 signals: [$collection]
39 }
40···42 let filter = (http get $"($url)/filter")
43 print $"filter state: ($filter | to json)"
4445- if $filter.mode != "signal" {
46- print "FAILED: mode was not set to signal"
47 } else if not ($filter.signals | any { |s| $s == $collection }) {
48 print $"FAILED: ($collection) not in signals"
49 } else {
···52 # wait a moment for the firehose to connect and the filter to take effect
53 sleep 3sec
5455- let timestamp = (date now | format date "%Y-%m-%dT%H:%M:%SZ")
56- let record_data = {
57- "$type": $collection,
58- text: $"hydrant signal filter test ($timestamp)",
59- createdAt: $timestamp
60- }
6162 print "creating post..."
63 let create_res = (http post -t application/json -H ["Authorization" $"Bearer ($jwt)"] $"($pds_url)/xrpc/com.atproto.repo.createRecord" {
64 repo: $did,
65 collection: $collection,
066 record: $record_data
67 })
68 let rkey = ($create_res.uri | split row "/" | last)
69 print $"created: ($create_res.uri)"
7071- # give hydrant time to receive and process the firehose event
72- sleep 5sec
7374 # verify the record was indexed
75 print "checking indexed record..."
···11 exit 1
12 }
1314+ let port = 3011
15 let url = $"http://localhost:($port)"
16 let db_path = (mktemp -d -t hydrant_signal_test.XXXXXX)
17+18+ let random_str = (random chars -l 6)
19+ let collection = $"systems.hydrant.test.($random_str)"
2021 print $"database path: ($db_path)"
22···28 print "authenticated"
2930 let binary = build-hydrant
31+ $env.HYDRANT_RELAY_HOST = "wss://bsky.network/"
32 let instance = start-hydrant $binary $db_path $port
3334 mut test_passed = false
3536 if (wait-for-api $url) {
37+ # configure filter mode: index app.bsky.feed.post from anyone on the network
38+ print "configuring filter mode..."
39 http patch -t application/json $"($url)/filter" {
40+ mode: "filter",
41 signals: [$collection]
42 }
43···45 let filter = (http get $"($url)/filter")
46 print $"filter state: ($filter | to json)"
4748+ if $filter.mode != "filter" {
49+ print "FAILED: mode was not set to filter"
50 } else if not ($filter.signals | any { |s| $s == $collection }) {
51 print $"FAILED: ($collection) not in signals"
52 } else {
···55 # wait a moment for the firehose to connect and the filter to take effect
56 sleep 3sec
5758+ let timestamp = (date now | format date "%Y-%m-%dT%H:%M:%SZ")
59+ let record_data = {
60+ "$type": $collection,
61+ text: $"hydrant signal filter test ($timestamp) - bsky.network relay",
62+ createdAt: $timestamp
63+ }
6465 print "creating post..."
66 let create_res = (http post -t application/json -H ["Authorization" $"Bearer ($jwt)"] $"($pds_url)/xrpc/com.atproto.repo.createRecord" {
67 repo: $did,
68 collection: $collection,
69+ validate: false,
70 record: $record_data
71 })
72 let rkey = ($create_res.uri | split row "/" | last)
73 print $"created: ($create_res.uri)"
7475+ # give hydrant time to receive and process the firehose event and backfill
76+ sleep 10sec
7778 # verify the record was indexed
79 print "checking indexed record..."