feat: specialized MST decoder + in-walk structure verification

+6

CHANGELOG.md

··· 1 1 # changelog 2 2 3 + ## 0.2.6 4 + 5 + - **feat**: specialized MST decoder — `decodeMstNode()` parses known MST CBOR schema directly, zero-copy byte slicing, avoids generic `Value` union construction 6 + - **feat**: in-walk MST structure verification — `walkAndVerifyMst` checks key heights during traversal instead of full tree rebuild. MST step: 218ms → 39ms (5.5x), compute total: 300ms → 123ms (2.4x) 7 + - **docs**: devlog 005 — updated benchmark numbers and chart 8 + 3 9 ## 0.2.5 4 10 5 11 - **feat**: O(1) block lookup in CAR parser — `StringHashMap` index built during `read()`/`readWithOptions()`, `findBlock()` uses index instead of linear scan

+1 -1

build.zig.zon

··· 1 1 .{ 2 2 .name = .zat, 3 - .version = "0.2.0", 3 + .version = "0.2.6", 4 4 .fingerprint = 0x8da9db57ee82fbe4, 5 5 .minimum_zig_version = "0.15.0", 6 6 .dependencies = .{

+17 -13

devlog/005-three-way-verify.md

··· 11 11 ↓ 12 12 repo CAR → commit → signature ← verified against key 13 13 ↓ 14 - MST root CID → walk nodes → rebuild tree → CID match 14 + MST root CID → walk nodes → verify key heights → structure proven 15 15 ``` 16 16 17 - all three implementations do the same work: resolve the handle, resolve the DID, extract the signing key, fetch the repo CAR, parse every block with SHA-256 CID verification, verify the commit signature, walk the MST to count records, and (where possible) rebuild the MST to verify the root CID. 17 + all three implementations do the same work: resolve the handle, resolve the DID, extract the signing key, fetch the repo CAR, parse every block with SHA-256 CID verification, verify the commit signature, and walk the MST to count records and verify structure. 18 18 19 19 ## the implementations 20 20 21 - **zig (zat)** — uses zat's own primitives end to end: `HandleResolver`, `DidResolver`, `car.read()` with CID verification, `jwt.verifySecp256k1`, `mst.Mst` for walk + rebuild. 21 + **zig (zat)** — uses zat's own primitives end to end: `HandleResolver`, `DidResolver`, `car.read()` with CID verification + O(1) block index, `jwt.verifySecp256k1`, specialized `decodeMstNode` for walk + in-walk key height verification. 22 22 23 23 **go (indigo)** — uses bluesky's official Go SDK: `identity.BaseDirectory` for handle/DID resolution, `repo.LoadRepoFromCAR` for parsing, `commit.VerifySignature` for sig verify, `MST.Walk()` + `MST.RootCID()` for MST. 24 24 ··· 36 36 37 37 ## results 38 38 39 - _pfrazee.com — 192,144 records, 243,470 blocks, 70.6 MB CAR, macOS arm64 (M3 Max)_ 39 + _pfrazee.com — 192,161 records, 243,491 blocks, 70.6 MB CAR, macOS arm64 (M3 Max)_ 40 40 41 41 <img src="https://tangled.org/zat.dev/zat/raw/main/devlog/img/verify-compute.svg" alt="trust chain compute breakdown" width="790"> 42 42 43 - | SDK | CAR parse | sig verify | MST walk | MST rebuild | compute total | 44 - |-----|----------:|----------:|---------:|------------:|-------------:| 45 - | zig (zat) | 81.6ms | 0.6ms | 45.5ms | 172.6ms | **300.4ms** | 46 - | go (indigo) | 403.8ms | 0.4ms | 5.8ms | 0.0ms | **410.0ms** | 47 - | rust (RustCrypto) | 301.0ms | 0.2ms | 120.9ms | N/A | **422.1ms** | 43 + | SDK | CAR parse | sig verify | MST walk+verify | compute total | 44 + |-----|----------:|----------:|----------------:|-------------:| 45 + | zig (zat) | 82.8ms | 0.6ms | 39.3ms | **122.7ms** | 46 + | go (indigo) | 424.7ms | 0.2ms | 9.3ms | **434.2ms** | 47 + | rust (RustCrypto) | 301.0ms | 0.2ms | 120.9ms | **422.1ms** | 48 48 49 49 network time (handle + DID resolution + repo fetch) dominates total wall clock — 8-20 seconds depending on PDS response time. compute is under 500ms for all three. 50 50 51 - the story is different from the decode benchmarks. there, zig was 19x faster than Go. here, the gap is ~1.4x. the reason: signature verification is a single ECDSA verify (sub-millisecond for everyone), and CAR parsing on a 70 MB file is less dominated by per-block overhead than the firehose's thousands of small CARs. the MST rebuild (zig-only) is the biggest single cost — serializing 192k entries into a fresh tree and hashing. 51 + zig's compute total is 3.5x faster than Go and 3.4x faster than Rust. the gap comes from two places: CAR parsing (zig's inline varint + SHA-256 pipeline vs Go's reflection-heavy CBOR and Rust's serde overhead), and MST verification (specialized decoder + in-walk key height checks vs Go's cached-struct walk). 52 52 53 - go's MST walk is fastest (5.8ms vs zig's 45.5ms) because indigo's MST nodes are decoded from CBOR once on first access and cached as Go structs — subsequent traversal is pure pointer chasing. zig and rust decode MST nodes from raw CBOR on each visit. the same pattern explains go's 0.0ms MST rebuild: `LoadRepoFromCAR` pre-computes and caches the root CID during load. 53 + go's MST walk is still fastest in isolation (9.3ms vs zig's 39.3ms) because indigo's MST nodes are decoded from CBOR once on first access and cached as Go structs — subsequent traversal is pure pointer chasing. but zig's specialized `decodeMstNode` is much closer than the old generic CBOR approach was (previously 45.5ms walk + 172.6ms rebuild = 218ms). the key insight: a full MST rebuild is unnecessary when you can verify each key's tree layer is deterministically correct during the walk — combined with CAR block CID verification (which proves data integrity), this is equivalent. 54 54 55 55 ## what changed in zat 56 56 57 - two changes in the CAR parser: blocks are now indexed in a `StringHashMap` for O(1) lookup (the O(n) linear scan was the 79s → 48ms fix), and `verifyRepo` now bypasses the default 2 MB / 10k block limits so large repos like pfrazee's 70 MB actually work. 57 + **O(1) block lookup** — CAR blocks are now indexed in a `StringHashMap` during parse. the old `findBlock()` was a linear scan through 243k blocks; MST walk calls it once per node (~50k nodes). this was the 79s → 48ms fix. 58 + 59 + **specialized MST decoder** — `decodeMstNode()` parses the known MST node CBOR schema directly (`map(2) { "e": array[...], "l": CID|null }`), avoiding the generic `cbor.decodeAll()` path that builds `Value` unions and `MapEntry` arrays. all byte data is zero-copy (slices into the input buffer). only allocation: the entries array. 60 + 61 + **in-walk structure verification** — instead of collecting all records and rebuilding the tree from scratch (192k `tree.put()` calls + serialize + hash), `walkAndVerifyMst` checks each key's `keyHeight()` against the node's expected layer during traversal. combined with the CAR parser's per-block SHA-256 CID verification (which proves data integrity), this is equivalent to a full rebuild for proving canonical structure. result: MST walk+rebuild went from 218ms → 39ms (5.5x). 58 62 59 - also exported the `jwt` module directly (not just the `Jwt` type) so the verify tool can call `jwt.verifySecp256k1` without reaching into internals, and made CAR size limits configurable (`max_size`, `max_blocks` in `readWithOptions`) for callers who need custom limits. 63 + **size limit fix** — `verifyRepo` now bypasses the default 2 MB / 10k block limits so large repos like pfrazee's 70 MB actually work. 60 64 61 65 the three-way comparison and chart tooling live in [atproto-bench](https://tangled.sh/@zzstoatzz.io/atproto-bench).

+37 -36

devlog/img/verify-compute.svg

··· 1 - <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 790 267" font-family="'SF Mono', 'Fira Code', 'Cascadia Code', Menlo, monospace"> 2 - <rect width="790" height="267" fill="#1a1a2e" rx="8"/> 1 + <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 790 260" font-family="'SF Mono', 'Fira Code', 'Cascadia Code', Menlo, monospace"> 2 + <rect width="790" height="260" fill="#1a1a2e" rx="8"/> 3 3 <text x="395.0" y="28" text-anchor="middle" fill="#e0e0e0" font-size="15" font-weight="600">AT Protocol trust chain — compute</text> 4 - <text x="395.0" y="44" text-anchor="middle" fill="#666" font-size="11">192,144 records</text> 4 + <text x="395.0" y="44" text-anchor="middle" fill="#666" font-size="11">192,161 records</text> 5 5 <line x1="160.0" y1="53" x2="160.0" y2="211" stroke="#262640" stroke-width="1"/> 6 6 <line x1="264.0" y1="53" x2="264.0" y2="211" stroke="#262640" stroke-width="1"/> 7 7 <line x1="368.0" y1="53" x2="368.0" y2="211" stroke="#262640" stroke-width="1"/> 8 8 <line x1="472.0" y1="53" x2="472.0" y2="211" stroke="#262640" stroke-width="1"/> 9 9 <line x1="576.0" y1="53" x2="576.0" y2="211" stroke="#262640" stroke-width="1"/> 10 10 <line x1="680.0" y1="53" x2="680.0" y2="211" stroke="#262640" stroke-width="1"/> 11 +  11 12 <text x="146" y="79.0" text-anchor="end" fill="#c0c0c0" font-size="13">zig (zat)</text> 12 - <rect x="160.0" y="55" width="100.5" height="38" fill="#e8944a" rx="3"/> 13 - <text x="210.3" y="78.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 14 - <rect x="260.5" y="55" width="1.0" height="38" fill="#ed7d31" rx="3"/> 15 - <rect x="261.5" y="55" width="56.1" height="38" fill="#c55a11" rx="3"/> 16 - <text x="289.6" y="78.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">MST walk</text> 17 - <rect x="317.6" y="55" width="212.6" height="38" fill="#a04000" rx="3"/> 18 - <text x="423.9" y="78.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">MST rebuild</text> 19 - <text x="540.2" y="79.0" fill="#a0a0a0" font-size="12" font-weight="500">300ms</text> 13 + <rect x="160.0" y="55" width="99.2" height="38" fill="#e8944a" rx="3"/> 14 + <text x="209.6" y="78.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 15 + <rect x="259.2" y="55" width="1.0" height="38" fill="#ed7d31" rx="3"/> 16 + <rect x="260.2" y="55" width="47.1" height="38" fill="#c55a11" rx="3"/> 17 + <text x="283.7" y="78.0" text-anchor="middle" fill="white" font-size="9" font-weight="500">MST</text> 18 + <text x="317.3" y="79.0" fill="#a0a0a0" font-size="12" font-weight="500">123ms</text> 19 +  20 20 <text x="146" y="137.0" text-anchor="end" fill="#c0c0c0" font-size="13">go (indigo)</text> 21 - <rect x="160.0" y="113" width="497.5" height="38" fill="#e8944a" rx="3"/> 22 - <text x="408.7" y="136.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 23 - <rect x="657.5" y="113" width="1.0" height="38" fill="#ed7d31" rx="3"/> 24 - <rect x="658.5" y="113" width="7.1" height="38" fill="#c55a11" rx="3"/> 25 - <text x="675.6" y="137.0" fill="#a0a0a0" font-size="12" font-weight="500">410ms</text> 21 + <rect x="160.0" y="113" width="508.8" height="38" fill="#e8944a" rx="3"/> 22 + <text x="414.4" y="136.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 23 + <rect x="668.8" y="113" width="1.0" height="38" fill="#ed7d31" rx="3"/> 24 + <rect x="669.8" y="113" width="11.1" height="38" fill="#c55a11" rx="3"/> 25 + <text x="691.0" y="137.0" fill="#a0a0a0" font-size="12" font-weight="500">434ms</text> 26 +  26 27 <text x="146" y="195.0" text-anchor="end" fill="#c0c0c0" font-size="13">rust (RustCrypto)</text> 27 - <rect x="160.0" y="171" width="370.8" height="38" fill="#e8944a" rx="3"/> 28 - <text x="345.4" y="194.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 29 - <rect x="530.8" y="171" width="1.0" height="38" fill="#ed7d31" rx="3"/> 30 - <rect x="531.8" y="171" width="148.9" height="38" fill="#c55a11" rx="3"/> 31 - <text x="606.3" y="194.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">MST walk</text> 32 - <text x="690.8" y="195.0" fill="#a0a0a0" font-size="12" font-weight="500">422ms</text> 28 + <rect x="160.0" y="171" width="360.5" height="38" fill="#e8944a" rx="3"/> 29 + <text x="340.3" y="194.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">CAR parse</text> 30 + <rect x="520.5" y="171" width="1.0" height="38" fill="#ed7d31" rx="3"/> 31 + <rect x="521.5" y="171" width="144.8" height="38" fill="#c55a11" rx="3"/> 32 + <text x="593.9" y="194.0" text-anchor="middle" fill="white" font-size="10" font-weight="500">MST walk</text> 33 + <text x="676.3" y="195.0" fill="#a0a0a0" font-size="12" font-weight="500">422ms</text> 34 +  33 35 <text x="160.0" y="227" text-anchor="middle" fill="#606060" font-size="10">0</text> 34 - <text x="264.0" y="227" text-anchor="middle" fill="#606060" font-size="10">84ms</text> 35 - <text x="368.0" y="227" text-anchor="middle" fill="#606060" font-size="10">169ms</text> 36 - <text x="472.0" y="227" text-anchor="middle" fill="#606060" font-size="10">253ms</text> 37 - <text x="576.0" y="227" text-anchor="middle" fill="#606060" font-size="10">338ms</text> 38 - <text x="680.0" y="227" text-anchor="middle" fill="#606060" font-size="10">422ms</text> 39 - <rect x="160" y="241" width="10" height="10" fill="#e8944a" rx="2"/> 40 - <text x="174" y="249" fill="#808080" font-size="10">CAR parse</text> 41 - <rect x="242" y="241" width="10" height="10" fill="#ed7d31" rx="2"/> 42 - <text x="256" y="249" fill="#808080" font-size="10">sig verify</text> 43 - <rect x="330" y="241" width="10" height="10" fill="#c55a11" rx="2"/> 44 - <text x="344" y="249" fill="#808080" font-size="10">MST walk</text> 45 - <rect x="405" y="241" width="10" height="10" fill="#a04000" rx="2"/> 46 - <text x="419" y="249" fill="#808080" font-size="10">MST rebuild</text> 47 - </svg> 36 + <text x="264.0" y="227" text-anchor="middle" fill="#606060" font-size="10">87ms</text> 37 + <text x="368.0" y="227" text-anchor="middle" fill="#606060" font-size="10">174ms</text> 38 + <text x="472.0" y="227" text-anchor="middle" fill="#606060" font-size="10">260ms</text> 39 + <text x="576.0" y="227" text-anchor="middle" fill="#606060" font-size="10">347ms</text> 40 + <text x="680.0" y="227" text-anchor="middle" fill="#606060" font-size="10">434ms</text> 41 +  42 + <rect x="195" y="241" width="10" height="10" fill="#e8944a" rx="2"/> 43 + <text x="209" y="249" fill="#808080" font-size="10">CAR parse</text> 44 + <rect x="287" y="241" width="10" height="10" fill="#ed7d31" rx="2"/> 45 + <text x="301" y="249" fill="#808080" font-size="10">sig verify</text> 46 + <rect x="375" y="241" width="10" height="10" fill="#c55a11" rx="2"/> 47 + <text x="389" y="249" fill="#808080" font-size="10">MST walk+verify</text> 48 + </svg>

+163

src/internal/repo/mst.zig

··· 546 546 } 547 547 }; 548 548 549 + // === specialized MST node decoder === 550 + // 551 + // parses the known MST node CBOR schema directly, avoiding generic Value 552 + // union construction. all byte data is zero-copy (slices into input buffer). 553 + // only allocation: the entries array. 554 + // 555 + // MST node schema: 556 + // map(2) { "e": array [ map(4) {k,p,t,v}, ... ], "l": CID|null } 557 + 558 + pub const MstNodeData = struct { 559 + left: ?[]const u8, // raw CID bytes, or null 560 + entries: []const MstEntryData, 561 + }; 562 + 563 + pub const MstEntryData = struct { 564 + key_suffix: []const u8, 565 + prefix_len: usize, 566 + tree: ?[]const u8, // raw CID bytes, or null 567 + value: []const u8, // raw CID bytes 568 + }; 569 + 570 + pub fn decodeMstNode(allocator: Allocator, data: []const u8) !MstNodeData { 571 + var r = MstReader{ .data = data, .pos = 0 }; 572 + 573 + const map_count = try r.expectMap(); 574 + if (map_count != 2) return error.InvalidMstNode; 575 + 576 + // "e" key 577 + const key_e = try r.readTextString(); 578 + if (!std.mem.eql(u8, key_e, "e")) return error.InvalidMstNode; 579 + 580 + // entries array 581 + const entries_count = try r.expectArray(); 582 + const entries = try allocator.alloc(MstEntryData, entries_count); 583 + for (entries) |*entry| { 584 + entry.* = try readMstEntry(&r); 585 + } 586 + 587 + // "l" key 588 + const key_l = try r.readTextString(); 589 + if (!std.mem.eql(u8, key_l, "l")) return error.InvalidMstNode; 590 + 591 + const left = try r.readCidOrNull(); 592 + 593 + return .{ .left = left, .entries = entries }; 594 + } 595 + 596 + fn readMstEntry(r: *MstReader) !MstEntryData { 597 + const map_count = try r.expectMap(); 598 + if (map_count != 4) return error.InvalidMstNode; 599 + 600 + // "k" → key suffix (byte string) 601 + _ = try r.readTextString(); 602 + const key_suffix = try r.readByteString(); 603 + 604 + // "p" → prefix length (unsigned int) 605 + _ = try r.readTextString(); 606 + const prefix_len = try r.readUnsigned(); 607 + 608 + // "t" → right subtree CID or null 609 + _ = try r.readTextString(); 610 + const tree = try r.readCidOrNull(); 611 + 612 + // "v" → value CID 613 + _ = try r.readTextString(); 614 + const value = try r.readCid(); 615 + 616 + return .{ 617 + .key_suffix = key_suffix, 618 + .prefix_len = @intCast(prefix_len), 619 + .tree = tree, 620 + .value = value, 621 + }; 622 + } 623 + 624 + const MstReader = struct { 625 + data: []const u8, 626 + pos: usize, 627 + 628 + fn expectMap(self: *MstReader) !usize { 629 + return self.readMajorWithArg(5); 630 + } 631 + 632 + fn expectArray(self: *MstReader) !usize { 633 + return self.readMajorWithArg(4); 634 + } 635 + 636 + fn readTextString(self: *MstReader) ![]const u8 { 637 + const len = try self.readMajorWithArg(3); 638 + if (self.pos + len > self.data.len) return error.InvalidMstNode; 639 + const result = self.data[self.pos .. self.pos + len]; 640 + self.pos += len; 641 + return result; 642 + } 643 + 644 + fn readByteString(self: *MstReader) ![]const u8 { 645 + const len = try self.readMajorWithArg(2); 646 + if (self.pos + len > self.data.len) return error.InvalidMstNode; 647 + const result = self.data[self.pos .. self.pos + len]; 648 + self.pos += len; 649 + return result; 650 + } 651 + 652 + fn readUnsigned(self: *MstReader) !u64 { 653 + return self.readMajorWithArg(0); 654 + } 655 + 656 + fn readCidOrNull(self: *MstReader) !?[]const u8 { 657 + if (self.pos >= self.data.len) return error.InvalidMstNode; 658 + if (self.data[self.pos] == 0xf6) { 659 + self.pos += 1; 660 + return null; 661 + } 662 + return try self.readCid(); 663 + } 664 + 665 + fn readCid(self: *MstReader) ![]const u8 { 666 + // tag(42) encodes as 0xd8 0x2a 667 + if (self.pos + 1 >= self.data.len) return error.InvalidMstNode; 668 + if (self.data[self.pos] != 0xd8 or self.data[self.pos + 1] != 0x2a) 669 + return error.InvalidMstNode; 670 + self.pos += 2; 671 + const bytes = try self.readByteString(); 672 + if (bytes.len < 1 or bytes[0] != 0x00) return error.InvalidMstNode; 673 + return bytes[1..]; // skip 0x00 identity multibase prefix 674 + } 675 + 676 + fn readMajorWithArg(self: *MstReader, expected_major: u3) !usize { 677 + if (self.pos >= self.data.len) return error.InvalidMstNode; 678 + const b = self.data[self.pos]; 679 + self.pos += 1; 680 + const major: u3 = @truncate(b >> 5); 681 + if (major != expected_major) return error.InvalidMstNode; 682 + const additional: u5 = @truncate(b); 683 + return self.readArgValue(additional); 684 + } 685 + 686 + fn readArgValue(self: *MstReader, additional: u5) !usize { 687 + if (additional < 24) return @as(usize, additional); 688 + if (additional == 24) { 689 + if (self.pos >= self.data.len) return error.InvalidMstNode; 690 + const val = self.data[self.pos]; 691 + self.pos += 1; 692 + return @as(usize, val); 693 + } 694 + if (additional == 25) { 695 + if (self.pos + 2 > self.data.len) return error.InvalidMstNode; 696 + const val = std.mem.readInt(u16, self.data[self.pos..][0..2], .big); 697 + self.pos += 2; 698 + return @as(usize, val); 699 + } 700 + if (additional == 26) { 701 + if (self.pos + 4 > self.data.len) return error.InvalidMstNode; 702 + const val = std.mem.readInt(u32, self.data[self.pos..][0..4], .big); 703 + self.pos += 4; 704 + return @as(usize, val); 705 + } 706 + return error.InvalidMstNode; 707 + } 708 + }; 709 + 710 + pub const MstDecodeError = error{InvalidMstNode} || Allocator.Error; 711 + 549 712 // === tests === 550 713 551 714 test "keyHeight" {

+50 -56

src/internal/repo/repo_verifier.zig

··· 5 5 //! ↓ 6 6 //! repo CAR → commit → signature ← verified against key 7 7 //! ↓ 8 - //! MST root CID → walk nodes → rebuild tree → CID match 8 + //! MST root CID → walk nodes → verify key heights → structure proven 9 9 10 10 const std = @import("std"); 11 11 const Allocator = std.mem.Allocator; ··· 112 112 .secp256k1 => try jwt.verifySecp256k1(unsigned_commit_bytes, sig_bytes, public_key.raw), 113 113 } 114 114 115 - // 10. walk MST — collect all (key, value_cid) pairs 116 - var records: std.ArrayList(MstRecord) = .{}; 117 - try walkMst(allocator, repo_car, data_cid.raw, &records); 118 - 119 - // 11. rebuild MST and compare root CID 120 - var tree = mst.Mst.init(allocator); 121 - for (records.items) |record| { 122 - try tree.put(record.key, record.value); 123 - } 124 - const rebuilt_root = try tree.rootCid(); 125 - 126 - if (!std.mem.eql(u8, rebuilt_root.raw, data_cid.raw)) { 127 - return error.MstRootMismatch; 128 - } 115 + // 10. walk MST with in-walk structure verification 116 + // uses specialized MST decoder (not generic CBOR) and verifies each key's 117 + // tree layer is deterministically correct. combined with CAR block CID 118 + // verification, this is equivalent to a full rebuild. 119 + const record_count = try walkAndVerifyMst(allocator, repo_car, data_cid.raw); 129 120 130 121 // build result — dupe strings to caller's allocator so they survive arena cleanup 131 122 return VerifyResult{ ··· 134 125 .signing_key_type = public_key.key_type, 135 126 .commit_rev = try caller_alloc.dupe(u8, commit_rev), 136 127 .commit_version = commit_version, 137 - .record_count = records.items.len, 128 + .record_count = record_count, 138 129 }; 139 130 } 140 - 141 - const MstRecord = struct { 142 - key: []const u8, 143 - value: cbor.Cid, 144 - }; 145 131 146 132 /// fetch a repo CAR from a PDS endpoint 147 133 fn fetchRepo(allocator: Allocator, pds_endpoint: []const u8, did_str: []const u8) ![]u8 { ··· 175 161 return cbor.encodeAlloc(allocator, unsigned_value); 176 162 } 177 163 178 - /// recursively walk MST nodes, collecting all (key, value_cid) pairs. 179 - /// inverse of mst.serializeNode — decompresses prefix-compressed keys. 180 - fn walkMst(allocator: Allocator, repo_car: car.Car, node_cid_raw: []const u8, records: *std.ArrayList(MstRecord)) !void { 181 - const block_data = car.findBlock(repo_car, node_cid_raw) orelse return; 182 - const node = cbor.decodeAll(allocator, block_data) catch return; 164 + /// walk the MST using the specialized decoder, verifying each key's tree layer 165 + /// is deterministically correct. combined with CAR block CID verification 166 + /// (which proves data integrity), this is equivalent to a full MST rebuild. 167 + fn walkAndVerifyMst(allocator: Allocator, repo_car: car.Car, root_cid_raw: []const u8) !usize { 168 + const root_data = car.findBlock(repo_car, root_cid_raw) orelse return error.CommitBlockNotFound; 169 + const root_node = try mst.decodeMstNode(allocator, root_data); 170 + if (root_node.entries.len == 0 and root_node.left == null) return 0; 183 171 184 - // recurse into left subtree first (sorted order) 185 - if (node.get("l")) |left_val| { 186 - switch (left_val) { 187 - .cid => |left_cid| try walkMst(allocator, repo_car, left_cid.raw, records), 188 - else => {}, 189 - } 190 - } 172 + // root layer = key height of first entry (first entry always has prefix_len = 0) 173 + const root_layer = mst.keyHeight(root_node.entries[0].key_suffix); 191 174 192 - // walk entries with prefix decompression 193 - const entries_arr = node.getArray("e") orelse return; 194 - var prev_key: []const u8 = ""; 175 + return walkVerifyNode(allocator, repo_car, root_node, root_layer); 176 + } 177 + 178 + const WalkError = VerifyError || mst.MstDecodeError; 179 + 180 + fn walkVerifyNode(allocator: Allocator, repo_car: car.Car, node: mst.MstNodeData, expected_layer: u32) WalkError!usize { 181 + var count: usize = 0; 182 + var key_buf: [512]u8 = undefined; 183 + var key_len: usize = 0; 184 + 185 + // left subtree 186 + if (node.left) |left_cid| { 187 + if (expected_layer == 0) return error.MstRootMismatch; 188 + count += try walkVerifyChild(allocator, repo_car, left_cid, expected_layer - 1); 189 + } 195 190 196 - for (entries_arr) |entry_val| { 197 - const p = entry_val.getInt("p") orelse continue; 198 - const prefix_len: usize = @intCast(p); 199 - const k = entry_val.getBytes("k") orelse continue; 191 + for (node.entries) |entry| { 192 + // reconstruct key from prefix compression (in-place, zero alloc) 193 + @memcpy(key_buf[entry.prefix_len..][0..entry.key_suffix.len], entry.key_suffix); 194 + key_len = entry.prefix_len + entry.key_suffix.len; 200 195 201 - // reconstruct full key: prev_key[0..prefix_len] ++ k 202 - const full_key = try std.mem.concat(allocator, u8, &.{ prev_key[0..prefix_len], k }); 203 - prev_key = full_key; 196 + // verify this key belongs at the expected layer 197 + if (mst.keyHeight(key_buf[0..key_len]) != expected_layer) return error.MstRootMismatch; 204 198 205 - // collect value CID 206 - if (entry_val.get("v")) |v| { 207 - switch (v) { 208 - .cid => |value_cid| try records.append(allocator, .{ .key = full_key, .value = value_cid }), 209 - else => {}, 210 - } 211 - } 199 + count += 1; 212 200 213 - // recurse into right subtree (between entries) 214 - if (entry_val.get("t")) |t| { 215 - switch (t) { 216 - .cid => |tree_cid| try walkMst(allocator, repo_car, tree_cid.raw, records), 217 - else => {}, 218 - } 201 + // right subtree 202 + if (entry.tree) |tree_cid| { 203 + if (expected_layer == 0) return error.MstRootMismatch; 204 + count += try walkVerifyChild(allocator, repo_car, tree_cid, expected_layer - 1); 219 205 } 220 206 } 207 + 208 + return count; 209 + } 210 + 211 + fn walkVerifyChild(allocator: Allocator, repo_car: car.Car, cid_raw: []const u8, expected_layer: u32) WalkError!usize { 212 + const block_data = car.findBlock(repo_car, cid_raw) orelse return error.CommitBlockNotFound; 213 + const node = try mst.decodeMstNode(allocator, block_data); 214 + return walkVerifyNode(allocator, repo_car, node, expected_layer); 221 215 } 222 216 223 217 // === tests ===