about things

the inversion algorithm#

how verifyCommitDiff proves a commit is a valid state transition.

the claim being verified#

given:

  • a signed commit with MST root data_cid (the "after" state)
  • a CAR containing the commit block + changed MST nodes + new records
  • a list of operations (create/update/delete with paths and CIDs)
  • a previous MST root prev_data (the "before" state, from the preceding commit)

prove: the operations, applied to prev_data, produce data_cid.

the trick: instead of applying ops forward (which would require the full previous tree), invert them on the new tree and check if you get back to the old root.

why inversion instead of forward application#

forward application would require:

  1. the full previous MST (potentially millions of entries)
  2. applying each op to produce the new tree
  3. comparing the new root against data_cid

inversion requires:

  1. only the partial MST from the CAR (changed nodes only)
  2. reversing each op on the new tree
  3. comparing the inverted root against prev_data

the CAR already contains the new partial tree. unchanged subtrees become stubs — their CIDs are known but their contents aren't loaded. this is the key insight: you only need blocks for the parts that changed.

step by step#

source: repo_verifier.zig:387-469

1. parse CAR, extract commit#

loadCommitFromCAR(blocks) → { commit, unsigned_bytes, repo_car }

the commit is CBOR with fields: did, version, data (MST root CID), rev, prev, sig. the unsigned bytes are re-encoded without sig for signature verification.

2. verify signature#

verify(unsigned_bytes, commit.sig, public_key) → ok | error

proves the commit came from the account's signing key. this is independent of MST — it proves authorship.

3. load partial MST from CAR#

Mst.loadFromBlocks(repo_car, commit.data_cid) → partial tree

source: mst.zig:459-480

walks from the root block, loading nodes whose blocks exist in the CAR. blocks not found become ChildRef.stub — a CID placeholder. the result is a tree where:

  • changed paths have fully loaded nodes
  • unchanged subtrees are stubs (just their hash)

4. copy tree for inversion#

tree.copy() → inverted

deep clone. stubs stay as stubs. the original tree is preserved.

5. normalize operations#

normalizeOps(msg_ops) → sorted_ops

source: mst.zig:773-800

  • sort: deletions first, then by path lexicographically
  • reject duplicates (same path twice → DuplicatePath)

deletions first matters: if you invert a create before a delete on related paths, you might touch a node that should already be gone. ordering ensures deterministic application.

6. invert each operation#

source: mst.zig:804-820

for each op in sorted order:

forward op inverse action verification
create(path, cid) deleteReturn(path) removed value must equal cid
update(path, new_cid, prev_cid) putReturn(path, prev_cid) displaced value must equal new_cid
delete(path, prev_cid) putReturn(path, prev_cid) path must not already exist (displaced == null)

each step is self-checking. if any assertion fails → InversionMismatch.

7. compute inverted root#

inverted.rootCid() → cid

traverses the (now-inverted) tree bottom-up. loaded nodes are serialized as DAG-CBOR and SHA-256 hashed. stubs contribute their known CIDs directly — this is safe because we trust the previous state.

if inversion needs to descend into a stub (operation touches an unchanged subtree), it fails with PartialTree — the CAR didn't include enough context.

8. compare#

inverted_root == prev_data → valid transition

if they match, the operations fully explain the state change from the old root to the new root. no hidden mutations, no missing ops, no reordering.

what stubs mean for the proof#

stubs are the mechanism that makes partial verification possible:

ChildRef = union(enum) {
    none,           — no child (leaf boundary)
    node: *Node,    — loaded from CAR, fully traversable
    stub: cbor.Cid, — CID known, content trusted
}

when computing rootCid():

  • node → serialize, hash, produce CID
  • stub → use the CID as-is (it's trusted from previous verification)
  • none → encoded as null in CBOR

the proof chain holds because each stub was once a verified node in a previous commit. by induction, every stub's CID is correct.

error taxonomy#

error meaning response
SignatureVerificationFailed wrong key or tampered commit reject, re-resolve DID
PrevDataMismatch ops don't explain the transition chain broken, re-sync
InversionMismatch individual op inconsistent with tree state reject commit
PartialTree CAR missing blocks needed for inversion incomplete data
DuplicatePath same path appears in multiple ops malformed commit
InvalidMstNode CBOR structure of MST node is wrong corrupted data
InvalidCommit commit object malformed or wrong version corrupted data

the first three are the interesting ones — they distinguish between identity fraud (signature), history fraud (prevData), and operation fraud (inversion).