# the inversion algorithm how `verifyCommitDiff` proves a commit is a valid state transition. ## the claim being verified given: - a signed commit with MST root `data_cid` (the "after" state) - a CAR containing the commit block + changed MST nodes + new records - a list of operations (create/update/delete with paths and CIDs) - a previous MST root `prev_data` (the "before" state, from the preceding commit) prove: the operations, applied to `prev_data`, produce `data_cid`. the trick: instead of *applying* ops forward (which would require the full previous tree), *invert* them on the new tree and check if you get back to the old root. ## why inversion instead of forward application forward application would require: 1. the full previous MST (potentially millions of entries) 2. applying each op to produce the new tree 3. comparing the new root against `data_cid` inversion requires: 1. only the partial MST from the CAR (changed nodes only) 2. reversing each op on the new tree 3. comparing the inverted root against `prev_data` the CAR already contains the new partial tree. unchanged subtrees become stubs — their CIDs are known but their contents aren't loaded. this is the key insight: you only need blocks for the parts that changed. ## step by step source: `repo_verifier.zig:387-469` ### 1. parse CAR, extract commit ``` loadCommitFromCAR(blocks) → { commit, unsigned_bytes, repo_car } ``` the commit is CBOR with fields: `did`, `version`, `data` (MST root CID), `rev`, `prev`, `sig`. the unsigned bytes are re-encoded without `sig` for signature verification. ### 2. verify signature ``` verify(unsigned_bytes, commit.sig, public_key) → ok | error ``` proves the commit came from the account's signing key. this is independent of MST — it proves authorship. ### 3. load partial MST from CAR ``` Mst.loadFromBlocks(repo_car, commit.data_cid) → partial tree ``` source: `mst.zig:459-480` walks from the root block, loading nodes whose blocks exist in the CAR. blocks not found become `ChildRef.stub` — a CID placeholder. the result is a tree where: - changed paths have fully loaded nodes - unchanged subtrees are stubs (just their hash) ### 4. copy tree for inversion ``` tree.copy() → inverted ``` deep clone. stubs stay as stubs. the original tree is preserved. ### 5. normalize operations ``` normalizeOps(msg_ops) → sorted_ops ``` source: `mst.zig:773-800` - sort: **deletions first**, then by path lexicographically - reject duplicates (same path twice → `DuplicatePath`) deletions first matters: if you invert a create before a delete on related paths, you might touch a node that should already be gone. ordering ensures deterministic application. ### 6. invert each operation source: `mst.zig:804-820` for each op in sorted order: | forward op | inverse action | verification | |---|---|---| | **create**(path, cid) | `deleteReturn(path)` | removed value must equal `cid` | | **update**(path, new_cid, prev_cid) | `putReturn(path, prev_cid)` | displaced value must equal `new_cid` | | **delete**(path, prev_cid) | `putReturn(path, prev_cid)` | path must not already exist (displaced == null) | each step is self-checking. if any assertion fails → `InversionMismatch`. ### 7. compute inverted root ``` inverted.rootCid() → cid ``` traverses the (now-inverted) tree bottom-up. loaded nodes are serialized as DAG-CBOR and SHA-256 hashed. stubs contribute their known CIDs directly — this is safe because we trust the previous state. if inversion needs to descend into a stub (operation touches an unchanged subtree), it fails with `PartialTree` — the CAR didn't include enough context. ### 8. compare ``` inverted_root == prev_data → valid transition ``` if they match, the operations *fully explain* the state change from the old root to the new root. no hidden mutations, no missing ops, no reordering. ## what stubs mean for the proof stubs are the mechanism that makes partial verification possible: ``` ChildRef = union(enum) { none, — no child (leaf boundary) node: *Node, — loaded from CAR, fully traversable stub: cbor.Cid, — CID known, content trusted } ``` when computing `rootCid()`: - `node` → serialize, hash, produce CID - `stub` → use the CID as-is (it's trusted from previous verification) - `none` → encoded as null in CBOR the proof chain holds because each stub was once a verified node in a previous commit. by induction, every stub's CID is correct. ## error taxonomy | error | meaning | response | |---|---|---| | `SignatureVerificationFailed` | wrong key or tampered commit | reject, re-resolve DID | | `PrevDataMismatch` | ops don't explain the transition | chain broken, re-sync | | `InversionMismatch` | individual op inconsistent with tree state | reject commit | | `PartialTree` | CAR missing blocks needed for inversion | incomplete data | | `DuplicatePath` | same path appears in multiple ops | malformed commit | | `InvalidMstNode` | CBOR structure of MST node is wrong | corrupted data | | `InvalidCommit` | commit object malformed or wrong version | corrupted data | the first three are the interesting ones — they distinguish between identity fraud (signature), history fraud (prevData), and operation fraud (inversion).