about things
at main 175 lines 9.1 kB view raw view rendered
1# sync verification (sync 1.1) 2 3how relays and downstream services verify firehose commits without trusting the source. 4 5source: [atproto sync spec](https://atproto.com/specs/sync), [repository spec](https://atproto.com/specs/repository), [event stream spec](https://atproto.com/specs/event-stream), [cryptography spec](https://atproto.com/specs/cryptography) 6 7## the inductive proof chain 8 9the core idea: instead of re-fetching the full repo to verify each change, you verify that each commit is a valid *transition* from the previous state. this only requires tracking two values per DID: 10 11- `rev` — the TID of the last verified commit 12- `data` — the MST root CID of the last verified commit 13 14### base case 15 16establish ground truth by fetching the full repo (via `getRepo` CAR export), verifying the MST structure, and verifying the commit signature. now you *know* the repo state is correct at this point. 17 18### inductive step 19 20for each subsequent `#commit` from the firehose: 21 221. **check chain continuity** — the event's `since` must match your stored `rev`, and `prevData` must match your stored `data` 232. **verify signature** — re-encode the commit without `sig`, SHA-256 hash, verify ECDSA 243. **MST inversion** — apply the `ops` in *reverse* against the partial MST from the CAR blocks. if the ops are complete, the resulting root CID must equal `prevData` 254. **update state** — store the new `rev` and `data` 26 27if step 1 or 3 fails, the chain is broken → mark the repo as desynchronized and re-fetch. 28 29## commit event fields that matter 30 31the signed commit object (in the CAR) has: 32 33| field | notes | 34|---|---| 35| `did` | account DID | 36| `version` | always `3` (v1 dead, v2 legacy-compatible) | 37| `data` | CID of MST root — **this is what the proof chain tracks** | 38| `rev` | TID, must increase monotonically | 39| `prev` | virtually always `null` in v3 — vestigial from v2 | 40| `sig` | ECDSA signature over all other fields encoded as DAG-CBOR | 41 42the firehose `#commit` event adds: 43 44| field | notes | 45|---|---| 46| `since` | rev of the *preceding* commit (chain link) | 47| `prevData` | MST root CID of the preceding commit (chain link) | 48| `blocks` | CAR slice — only changed blocks, max 2 MB | 49| `ops` | up to 200 record operations | 50 51each `repoOp` has: 52 53| field | notes | 54|---|---| 55| `action` | `create`, `update`, or `delete` | 56| `path` | `<collection>/<rkey>` | 57| `cid` | new record CID (null for deletes) | 58| `prev` | previous record CID (for updates/deletes — required for inversion) | 59 60**important**: `since`/`prevData` are *unsigned* — they're not in the signed commit object. but they're *verifiable* via MST inversion. the relay proves they're correct by showing the math works out. 61 62## MST inversion (the "trick") 63 64from the spec: 65 66> the trick to this process is record operation inversion. `#commit` messages contain both a repo diff (CAR slice), and an array of record operations. the operations can be applied in reverse against a copy of the partial repo tree contained in the diff blocks. if the list of operations is complete, the root of the tree should be exactly that of the previous commit object of the repository. 67 68concretely, for each op applied in reverse: 69 70- **create** → delete that key from the new tree, verify removed CID matches `op.cid` 71- **update** → put `op.prev` back, verify displaced CID matches `op.cid` 72- **delete** → re-insert `op.prev`, verify key didn't already exist 73 74the CAR contains a *partial* MST — only the nodes that changed. unchanged subtrees become "stubs" (just their CID). after inversion, the root is computed bottom-up: stubs contribute their known CIDs, loaded nodes are serialized and hashed. if it matches `prevData`, the transition is proven valid. 75 76## what fails if data is tampered with 77 78| tampering | detection | 79|---|---| 80| modified record content | CAR block hash ≠ CID | 81| forged commit (wrong DID, rev, data) | signature verification fails | 82| wrong MST structure | inverted root ≠ `prevData` | 83| extra/missing operations | inverted root ≠ `prevData`, or inversion mismatch | 84| op claims to create X but X isn't in tree | `deleteReturn` returns null | 85| op touches unchanged subtree not in CAR | stub error (partial tree) | 86| high-S signature malleability | low-S check rejects it | 87 88## chain break → resync 89 90when the chain breaks (mismatched `since`/`prevData`, or a `#sync` event): 91 921. mark the repo as `desynchronized` 932. queue incoming events for this DID (don't drop them) 943. fetch the full CAR — **from the upstream relay first** (not the PDS) to avoid thundering herd 954. verify and reconcile state 965. replay queued events 97 98from the spec: 99 100> if many services attempt to re-synchronize a repository at the same time, the upstream PDS host may be overwhelmed with a 'thundering herd' of requests. to mitigate this, receiving services should first attempt to fetch the repo CAR file from their direct upstream (often a relay instance). 101 102## `#sync` events 103 104sent when repo state has been reset or is ambiguous (e.g. account reactivation after data corruption). contains only the commit block, not the full repo. 105 106> note that the repository *contents* are not included in the sync event: the `blocks` field only contains the repo commit object. downstream services would need to fetch the full repo CAR file to re-synchronize. 107 108## `#account` events 109 110| field | notes | 111|---|---| 112| `active` | whether the repo can be redistributed | 113| `status` | `takendown`, `suspended`, `deleted`, `deactivated`, `desynchronized`, `throttled` | 114 115the spec is clear: non-active accounts' content should not be redistributed. this means `listReposByCollection` should filter by active status. 116 117> when an account status is non-`active`, the content that hosts should not redistributed includes: repository exports (CAR files), repo records, transformed records ('views', embeds, etc), blobs, transformed blobs (thumbnails, etc) 118 119account events are **hop-by-hop** — they describe status *at the emitting service*, not globally. 120 121## validation checklist (from spec) 122 123what a relay **should** do for each `#commit`: 124 1251. verify commit signature (refresh identity on initial failure) 1262. verify event fields match the signed commit in `blocks` 1273. verify `blocks` against `ops` and `prevData` via MST inversion 1284. check `since` against stored `rev` — mismatch → out-of-sync 1295. check `prevData` against stored `data` — mismatch → out-of-sync 1306. ignore events with `rev ≤ stored_rev` 1317. reject events with future `rev` (beyond clock drift window) 1328. ignore events for non-active accounts 1339. do NOT validate records against lexicons (relay-specific) 134 135## cryptographic details 136 137- two curves: P-256 (`secp256r1`) and secp256k1 — implementations must support both 138- low-S normalization **required** for both curves 139- signing: DAG-CBOR encode unsigned commit → SHA-256 hash (binary) → ECDSA sign 140- the CID of a signed commit uses the *signed* DAG-CBOR encoding (codec `0x71`) 141- public keys: compressed 33-byte points, multicodec-prefixed (`0x80 0x24` for P-256, `0xe7 0x01` for secp256k1), then multibase-encoded (`z` + base58btc) 142 143## cursor semantics 144 145sequence numbers are per-service, per-endpoint. reconnection rules: 146 147- no cursor → start from current position 148- cursor in rollback window → replay from that point 149- cursor too old → info message, then replay entire rollback window 150- cursor in the future → `FutureCursor` error, close connection 151 152a relay should track both "last received seq" (for reconnection) and "high water mark" (for persistence after processing completes). 153 154## implementations 155 156| | zat/zlay | lightrail (fig) | collectiondir (indigo) | 157|---|---|---|---| 158| signature verification | yes (P-256 + secp256k1) | not yet (resolver ready) | no | 159| MST inversion | yes (`verifyCommitDiff`) | not yet (MST parsing exists) | no | 160| per-DID ordering | caller responsibility | `CommitDispatcher` enforces | n/a | 161| prev chain tracking | yes (postgres, CAS upsert) | `RepoPrev` storage built | no | 162| chain continuity checks | yes (log-only, metrics) | not yet | no | 163| account status | dual status (local + upstream) | tracked, not filtered at query | append-only (no removal) | 164| resync on discontinuity | not yet | architecture ready | n/a | 165 166zat has the cryptographic and structural verification. zlay (march 2026) runs chain continuity detection in observation mode — logging breaks and counting them via prometheus, not yet enforcing. `verifyCommitDiff` is wired but behind a config flag; production uses `verifyCommitCar` (signature-only). see [inductive-proof/relay-integration.md](./inductive-proof/relay-integration.md) for details. 167 168lightrail has the operational scheduling and recovery. collectiondir trusts the upstream relay entirely. 169 170## see also 171 172- [inductive-proof/](./inductive-proof/) — deep dive: algorithm, relay integration, SDK affordances 173- [firehose](./firehose.md) — event stream basics, consuming events 174- [data](./data.md) — repos, records, collections 175- [identity](./identity.md) — DIDs, handles, key resolution