about things

sync verification (sync 1.1)#

how relays and downstream services verify firehose commits without trusting the source.

source: atproto sync spec, repository spec, event stream spec, cryptography spec

the inductive proof chain#

the core idea: instead of re-fetching the full repo to verify each change, you verify that each commit is a valid transition from the previous state. this only requires tracking two values per DID:

  • rev — the TID of the last verified commit
  • data — the MST root CID of the last verified commit

base case#

establish ground truth by fetching the full repo (via getRepo CAR export), verifying the MST structure, and verifying the commit signature. now you know the repo state is correct at this point.

inductive step#

for each subsequent #commit from the firehose:

  1. check chain continuity — the event's since must match your stored rev, and prevData must match your stored data
  2. verify signature — re-encode the commit without sig, SHA-256 hash, verify ECDSA
  3. MST inversion — apply the ops in reverse against the partial MST from the CAR blocks. if the ops are complete, the resulting root CID must equal prevData
  4. update state — store the new rev and data

if step 1 or 3 fails, the chain is broken → mark the repo as desynchronized and re-fetch.

commit event fields that matter#

the signed commit object (in the CAR) has:

field notes
did account DID
version always 3 (v1 dead, v2 legacy-compatible)
data CID of MST root — this is what the proof chain tracks
rev TID, must increase monotonically
prev virtually always null in v3 — vestigial from v2
sig ECDSA signature over all other fields encoded as DAG-CBOR

the firehose #commit event adds:

field notes
since rev of the preceding commit (chain link)
prevData MST root CID of the preceding commit (chain link)
blocks CAR slice — only changed blocks, max 2 MB
ops up to 200 record operations

each repoOp has:

field notes
action create, update, or delete
path <collection>/<rkey>
cid new record CID (null for deletes)
prev previous record CID (for updates/deletes — required for inversion)

important: since/prevData are unsigned — they're not in the signed commit object. but they're verifiable via MST inversion. the relay proves they're correct by showing the math works out.

MST inversion (the "trick")#

from the spec:

the trick to this process is record operation inversion. #commit messages contain both a repo diff (CAR slice), and an array of record operations. the operations can be applied in reverse against a copy of the partial repo tree contained in the diff blocks. if the list of operations is complete, the root of the tree should be exactly that of the previous commit object of the repository.

concretely, for each op applied in reverse:

  • create → delete that key from the new tree, verify removed CID matches op.cid
  • update → put op.prev back, verify displaced CID matches op.cid
  • delete → re-insert op.prev, verify key didn't already exist

the CAR contains a partial MST — only the nodes that changed. unchanged subtrees become "stubs" (just their CID). after inversion, the root is computed bottom-up: stubs contribute their known CIDs, loaded nodes are serialized and hashed. if it matches prevData, the transition is proven valid.

what fails if data is tampered with#

tampering detection
modified record content CAR block hash ≠ CID
forged commit (wrong DID, rev, data) signature verification fails
wrong MST structure inverted root ≠ prevData
extra/missing operations inverted root ≠ prevData, or inversion mismatch
op claims to create X but X isn't in tree deleteReturn returns null
op touches unchanged subtree not in CAR stub error (partial tree)
high-S signature malleability low-S check rejects it

chain break → resync#

when the chain breaks (mismatched since/prevData, or a #sync event):

  1. mark the repo as desynchronized
  2. queue incoming events for this DID (don't drop them)
  3. fetch the full CAR — from the upstream relay first (not the PDS) to avoid thundering herd
  4. verify and reconcile state
  5. replay queued events

from the spec:

if many services attempt to re-synchronize a repository at the same time, the upstream PDS host may be overwhelmed with a 'thundering herd' of requests. to mitigate this, receiving services should first attempt to fetch the repo CAR file from their direct upstream (often a relay instance).

#sync events#

sent when repo state has been reset or is ambiguous (e.g. account reactivation after data corruption). contains only the commit block, not the full repo.

note that the repository contents are not included in the sync event: the blocks field only contains the repo commit object. downstream services would need to fetch the full repo CAR file to re-synchronize.

#account events#

field notes
active whether the repo can be redistributed
status takendown, suspended, deleted, deactivated, desynchronized, throttled

the spec is clear: non-active accounts' content should not be redistributed. this means listReposByCollection should filter by active status.

when an account status is non-active, the content that hosts should not redistributed includes: repository exports (CAR files), repo records, transformed records ('views', embeds, etc), blobs, transformed blobs (thumbnails, etc)

account events are hop-by-hop — they describe status at the emitting service, not globally.

validation checklist (from spec)#

what a relay should do for each #commit:

  1. verify commit signature (refresh identity on initial failure)
  2. verify event fields match the signed commit in blocks
  3. verify blocks against ops and prevData via MST inversion
  4. check since against stored rev — mismatch → out-of-sync
  5. check prevData against stored data — mismatch → out-of-sync
  6. ignore events with rev ≤ stored_rev
  7. reject events with future rev (beyond clock drift window)
  8. ignore events for non-active accounts
  9. do NOT validate records against lexicons (relay-specific)

cryptographic details#

  • two curves: P-256 (secp256r1) and secp256k1 — implementations must support both
  • low-S normalization required for both curves
  • signing: DAG-CBOR encode unsigned commit → SHA-256 hash (binary) → ECDSA sign
  • the CID of a signed commit uses the signed DAG-CBOR encoding (codec 0x71)
  • public keys: compressed 33-byte points, multicodec-prefixed (0x80 0x24 for P-256, 0xe7 0x01 for secp256k1), then multibase-encoded (z + base58btc)

cursor semantics#

sequence numbers are per-service, per-endpoint. reconnection rules:

  • no cursor → start from current position
  • cursor in rollback window → replay from that point
  • cursor too old → info message, then replay entire rollback window
  • cursor in the future → FutureCursor error, close connection

a relay should track both "last received seq" (for reconnection) and "high water mark" (for persistence after processing completes).

implementations#

zat/zlay lightrail (fig) collectiondir (indigo)
signature verification yes (P-256 + secp256k1) not yet (resolver ready) no
MST inversion yes (verifyCommitDiff) not yet (MST parsing exists) no
per-DID ordering caller responsibility CommitDispatcher enforces n/a
prev chain tracking yes (postgres, CAS upsert) RepoPrev storage built no
chain continuity checks yes (log-only, metrics) not yet no
account status dual status (local + upstream) tracked, not filtered at query append-only (no removal)
resync on discontinuity not yet architecture ready n/a

zat has the cryptographic and structural verification. zlay (march 2026) runs chain continuity detection in observation mode — logging breaks and counting them via prometheus, not yet enforcing. verifyCommitDiff is wired but behind a config flag; production uses verifyCommitCar (signature-only). see inductive-proof/relay-integration.md for details.

lightrail has the operational scheduling and recovery. collectiondir trusts the upstream relay entirely.

see also#

  • inductive-proof/ — deep dive: algorithm, relay integration, SDK affordances
  • firehose — event stream basics, consuming events
  • data — repos, records, collections
  • identity — DIDs, handles, key resolution