about things

relay integration#

how zlay uses the sync 1.1 APIs from zat, as of march 2026. zlay is ~4 days old.

current state#

zlay is on zat v0.2.10. the sync 1.1 verification is wired but deployed in observation mode — chain breaks are logged and counted, not enforced.

the pipeline#

subscriber (reader thread)
  → header decode, cursor tracking
  → submit raw frame to thread pool

frame_worker (pool worker)
  → CBOR decode payload
  → rev clock check (reject future timestamps beyond 5min skew)
  → chain continuity check (log-only):
      since vs stored rev
      prevData vs stored data_cid
  → dispatch to validator

validator
  → DID cache lookup (miss → queue background resolve, skip frame)
  → verifyCommitCar(blocks, public_key, {verify_mst: false})
    OR verifyCommitDiff(blocks, ops, prev_data, public_key)  [behind config flag]
  → return (data_cid, commit_rev)

event_log
  → persist frame to disk
  → conditional upsert: UPDATE ... WHERE rev < new_rev
  → broadcast to consumers

what's working#

chain continuity detection (frame_worker.zig, subscriber.zig):

  • compares incoming since against stored rev
  • compares incoming prevData CID against stored data_cid
  • increments relay_chain_breaks_total prometheus counter
  • log-only — commits still flow through

conditional state upsert (event_log.zig):

INSERT INTO account_repo (uid, rev, commit_data_cid)
VALUES ($1, $2, $3)
ON CONFLICT (uid) DO UPDATE
  SET rev = EXCLUDED.rev, commit_data_cid = EXCLUDED.commit_data_cid
  WHERE account_repo.rev < EXCLUDED.rev

prevents concurrent workers from rolling back state. returns whether the update actually happened.

extractOps fix (validator.zig):

  • previously looked for separate collection/rkey fields (wrong)
  • now reads path field and splits on / (matches firehose wire format)
  • validates both halves: NSID for collection, rkey for record key

future rev rejection (frame_worker.zig):

  • parses incoming rev as TID, extracts microsecond timestamp
  • compares against wall clock + configurable skew (default 5 minutes)
  • rejects commits claiming to be from the future

what's not yet enabled#

full diff verification (verifyCommitDiff):

  • wired in validator.zig but behind config.verify_commit_diff flag
  • disabled in production — currently all commits go through verifyCommitCar (signature-only, MST verification disabled)
  • the observation mode lets operators measure chain break rates before strict enforcement

resync on chain break:

  • breaks are detected and logged but no recovery action is taken
  • the spec says: mark desynchronized, queue events, fetch full CAR, reconcile, replay
  • this is a significant operational feature (thundering herd concerns, etc.)

the optimistic validation pattern#

zlay's approach to DID resolution creates a trust window:

  1. first commit from a DID → cache miss → broadcast immediately, resolve key in background
  2. subsequent commits → cache hit → verify signature
  3. verification failure → evict cache, re-resolve, skip this frame

this is a deliberate trade-off: brief trust window for throughput. bounded by resolver thread count and resolution latency (~200ms per DID).

state requirements#

per-DID state for chain verification is minimal:

field type purpose
uid u64 internal ID (from DID mapping cache)
rev text last verified commit TID
commit_data_cid text last verified MST root, multibase-encoded

this maps directly to what the spec requires: track rev and data per repo.

questions for SDK design#

observations from watching zlay integrate:

  1. extractOps was wrong for months — the SDK provides MstOperation but the firehose wire format uses a different field layout (path vs collection+rkey, cid vs value). should the SDK provide a firehose-aware operation parser?

  2. chain continuity is caller responsibility — every consumer needs to track (rev, data_cid) and compare against incoming (since, prevData). this is boilerplate with subtle ordering requirements. could the SDK help?

  3. the observation-then-enforcement pattern — zlay chose log-only first. this is sensible for any consumer. does the SDK's error-based API support this well, or does it force binary accept/reject?

  4. multibase encoding of CIDs — zlay encodes CIDs as multibase base32lower for storage/comparison. this is a common need. the SDK has multibase.encode but the pattern of "extract CID from verify result, encode for storage" is repeated.