feat: add go-raw impl, CID verification in zig, two-tier results

+2

.gitignore

··· 4 4 5 5 # rust 6 6 rust/target/ 7 + rust-raw/target/ 7 8 8 9 # go 9 10 go/atproto-bench 11 + go-raw/atproto-bench-go-raw 10 12 11 13 # python 12 14 python/.venv/

+49 -17

README.md

··· 2 2 3 3 SDK-level firehose benchmarks for AT Protocol. 4 4 5 - decodes a corpus of real firehose frames using each language's AT Protocol SDK. 6 - same corpus, normalized work — every SDK does: header → commit → CAR → decode every block as DAG-CBOR. 5 + decodes a corpus of real firehose frames. same corpus, verified work parity via block counts and error counts. 7 6 8 7 ## what this measures 9 8 ··· 19 18 20 19 _3,298 firehose frames (16.2 MB), 5 measured passes, macOS arm64 (M3 Max)_ 21 20 21 + ### production-correct (with CID hash verification) 22 + 23 + only two SDKs verify CID hashes (SHA-256 per block): zat and indigo. this is the correct behavior for untrusted network data — it proves block content matches the content identifier. 24 + 22 25 | SDK | frames/sec (median) | MB/s | blocks/frame | errors | 23 26 |-----|--------:|-----:|-----:|-----:| 24 - | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat), arena reuse) | 628,091 | 3,044.8 | 9.98 | 0 | 25 - | zig (zat, alloc per frame) | 559,825 | 2,662.0 | 9.98 | 0 | 27 + | zig ([zat](https://tangled.sh/@zzstoatzz.io/zat), arena reuse) | 311,428 | 1,482.8 | 9.98 | 0 | 28 + | go ([indigo](https://github.com/bluesky-social/indigo)) | 15,560 | 75.3 | 9.98 | 0 | 29 + 30 + ### decode-only (no CID hash verification) 31 + 32 + the remaining SDKs skip CID verification. these numbers show decode throughput in isolation — useful for comparing SDK architecture, but not directly comparable to the verified numbers above. 33 + 34 + | SDK | frames/sec (median) | MB/s | blocks/frame | errors | 35 + |-----|--------:|-----:|-----:|-----:| 36 + | zig (zat, arena reuse) | 630,543 | 3,094.7 | 9.98 | 0 | 37 + | zig (zat, alloc per frame) | 525,906 | 2,552.0 | 9.98 | 0 | 26 38 | rust (raw, arena reuse) | 244,113 | 1,171.0 | 9.98 | 0 | 27 39 | rust (raw, alloc per frame) | 186,962 | 919.4 | 9.98 | 0 | 28 40 | rust ([jacquard](https://github.com/rsform/jacquard)) | 47,881 | 238.9 | 9.98 | 0 | 41 + | go (raw, fxamacker/cbor) | 41,398 | 200.7 | 9.98 | 0 | 29 42 | python ([atproto](https://github.com/MarshalX/atproto)) | 29,675 | 146.1 | 9.98 | 0 | 30 - | go ([indigo](https://github.com/bluesky-social/indigo)) | 11,548 | 58.0 | 9.98 | 0 | 43 + | go (indigo) | 15,560 | 75.3 | 9.98 | 0 | 44 + 45 + note: indigo appears in both tables. its number is the same because it always verifies — there is no option to disable it in go-car v1. 31 46 32 47 run-to-run variance is ~30-40%. compare ratios within a single `just bench` run, not across runs. 33 48 49 + ## CID verification 50 + 51 + a CID (Content IDentifier) contains a hash digest of the block's content. verifying it means SHA-256 hashing each block and comparing against the digest in the CID. this proves the block wasn't corrupted or tampered with in transit. 52 + 53 + | SDK | verifies CID hashes? | notes | 54 + |-----|---------------------|-------| 55 + | zig (zat) | yes (v0.2.1+) | `car.read()` verifies by default; `readWithOptions(.{ .verify_block_hashes = false })` to skip | 56 + | go (indigo, go-car v1) | yes (always) | no option to disable in v1 | 57 + | rust (jacquard, iroh-car) | no | not implemented | 58 + | rust (raw) | no | not implemented | 59 + | go (raw) | no | not implemented | 60 + | python (libipld) | no | not implemented | 61 + 34 62 ## what each SDK does 35 63 36 64 every SDK takes the same raw binary frame and decodes all the way through to per-block DAG-CBOR: 37 65 38 66 | SDK | decode path | 39 67 |-----|-------------| 40 - | zig | `cbor.decode` header → `cbor.decodeAll` payload → `car.read` → `cbor.decodeAll` per block | 68 + | zig | `cbor.decode` header → `cbor.decodeAll` payload → `car.read` (+ SHA-256 verify) → `cbor.decodeAll` per block | 41 69 | rust (raw) | `minicbor::Decoder` header → payload → hand-rolled sync CAR → `minicbor` + `bumpalo` per block | 42 70 | rust (jacquard) | `SubscribeReposMessage::decode_framed` → typed `Commit`, `jacquard_repo::car::parse_car_bytes` → blocks, `serde_ipld_dagcbor` per block | 43 - | go | `evt.Deserialize` → typed `RepoCommit` via code-gen CBOR → `car.NewBlockReader` → `cbornode.DecodeInto` per block | 71 + | go (raw) | `fxamacker/cbor` struct decode → hand-rolled sync CAR → `fxamacker/cbor` Unmarshal per block | 72 + | go (indigo) | `evt.Deserialize` → typed `RepoCommit` via code-gen CBOR → `car.NewBlockReader` (+ SHA-256 verify) → `cbornode.DecodeInto` per block | 44 73 | python | `Frame.from_bytes` + `parse_subscribe_repos_message` → `CAR.from_bytes` (libipld decodes all blocks internally) | 45 74 46 75 ## fairness notes 47 76 48 - - **zig** and **rust (raw)** both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern. zig uses its own arena allocator, rust uses bumpalo 49 - - **zig** returns zero-copy slices into the input buffer; rust (raw) does the same via minicbor's borrowed decoder. both avoid copying string/byte data. the remaining ~2.5x gap between zig and rust (raw) is likely due to Value type size (zig's 24-byte union vs rust's larger enum), arena implementation differences, and CBOR parser codegen 50 - - **rust (jacquard)** is the real AT Protocol SDK that rust developers use. it pays for serde-based owned deserialization (`String`, `BTreeMap`), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation. the "raw" variant shows what's possible in rust with the same architectural choices as zat 51 - - **go** indigo — bluesky's own production relay — is the slowest despite using code-generated CBOR unmarshal (no reflection). the cost is GC pressure: every string, byte slice, and block is a heap allocation that the garbage collector has to sweep. at ~10 blocks/frame, that's a lot of short-lived objects per decode 77 + - **CID verification**: only zat and indigo verify block hashes. this is ~2x overhead for zat (311k vs 630k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use 78 + - **zig** and **rust (raw)** both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern 79 + - **rust (jacquard)** is the real AT Protocol SDK that rust developers use. it pays for serde-based owned deserialization (`String`, `BTreeMap`), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation 80 + - **go (raw)** uses fxamacker/cbor (no reflection for known struct types), a hand-rolled sync CAR parser (no CID hash verification), and no indigo dependency. GC pressure remains the fundamental constraint — Go's experimental arena package (`GOEXPERIMENT=arenas`) is on hold and not recommended for production 81 + - **go (indigo)** — bluesky's own production relay — uses code-generated CBOR unmarshal (no reflection at the frame level) but pays for go-car's per-block CID hash verification and cbornode's reflection-based DAG-CBOR decode 52 82 - **python** is faster than jacquard despite being "Python" — its hot path is `libipld` (Rust via PyO3), which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call 53 - - **error handling**: all SDKs use infallible decode functions that never abort on failure — errors are counted and the frame is skipped. this means a corrupted frame doesn't invalidate the entire benchmark run 54 - - **capture coupling**: the corpus capture tool uses zat's CBOR decoder for the commit-with-ops header peek. this is standard CBOR parsing (not zat's typed firehose decoder), but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus. in practice, CBOR header parsing is the least likely step to diverge across implementations 83 + - **error handling**: all SDKs use infallible decode functions that never abort on failure — errors are counted and the frame is skipped 84 + - **capture coupling**: the corpus capture tool uses zat's CBOR decoder for the commit-with-ops header peek. this is standard CBOR parsing (not zat's typed firehose decoder), but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus 55 85 56 86 ## corpus format 57 87 ··· 64 94 ... 65 95 ``` 66 96 67 - frames are captured from ~10 seconds of live firehose traffic, filtered to commits with ops using a minimal CBOR header peek (zat's CBOR decoder parses just the header map and checks `t == "#commit"` + non-empty `ops` array). this is standard CBOR, not zat's typed firehose decoder, but see fairness notes for the coupling caveat. 97 + frames are captured from ~10 seconds of live firehose traffic, filtered to commits with ops using a minimal CBOR header peek. 68 98 69 99 ## when this matters 70 100 71 - **for live firehose consumption: usually no.** at ~500-1000 events/sec (full bluesky network), any of these SDKs decode fast enough. the bottleneck is network I/O, database writes, and business logic. 101 + **for live firehose consumption: usually no.** at ~500-1000 events/sec (full bluesky network), any of these SDKs decode fast enough. 72 102 73 103 **where it matters:** 74 104 - **backfill / replay** — processing months of historical data. decode throughput determines catch-up speed. ··· 79 109 80 110 | lang | SDK | version | CBOR engine | CAR engine | 81 111 |------|-----|---------|-------------|------------| 82 - | zig | [zat](https://tangled.sh/@zzstoatzz.io/zat) | 0.2.0 | hand-rolled | hand-rolled | 112 + | zig | [zat](https://tangled.sh/@zzstoatzz.io/zat) | 0.2.1 | hand-rolled | hand-rolled (+ SHA-256 CID verify) | 83 113 | rust | raw (minicbor + bumpalo) | — | [minicbor](https://crates.io/crates/minicbor) (zero-copy) | hand-rolled (sync) | 84 114 | rust | [jacquard](https://github.com/rsform/jacquard) | 0.9 | [ciborium](https://crates.io/crates/ciborium) (header) + [serde_ipld_dagcbor](https://crates.io/crates/serde_ipld_dagcbor) (body) | [iroh-car](https://crates.io/crates/iroh-car) (async) | 85 - | go | [indigo](https://github.com/bluesky-social/indigo) | latest | [cbor-gen](https://github.com/whyrusleeping/cbor-gen) (code-generated) | [go-car/v2](https://github.com/ipld/go-car) | 115 + | go | raw (fxamacker/cbor) | — | [fxamacker/cbor](https://github.com/fxamacker/cbor) | hand-rolled (sync, no CID verify) | 116 + | go | [indigo](https://github.com/bluesky-social/indigo) | latest | [cbor-gen](https://github.com/whyrusleeping/cbor-gen) (code-generated) | [go-car/v2](https://github.com/ipld/go-car) (+ SHA-256 CID verify) | 86 117 | python | [atproto](https://github.com/MarshalX/atproto) | 0.0.65 | [libipld](https://github.com/MarshalX/atproto) (Rust via PyO3) | libipld | 87 118 88 119 ## usage ··· 97 128 98 129 - `just capture` connects to the live firehose for ~10 seconds, filters for commits with ops via CBOR header peek (uses zat's CBOR decoder — see fairness notes), writes a length-prefixed corpus 99 130 - each benchmark decodes every frame fully: header → payload → CAR → decode every block as DAG-CBOR 131 + - zat and indigo additionally SHA-256 verify every block CID 100 132 - 2 warmup passes, 5 measured passes over the full corpus 101 133 - zig builds with `-Doptimize=ReleaseFast`, rust with `opt-level=3 lto=true` 102 134 - go and python use their standard release toolchains

+8

go-raw/go.mod

··· 1 + module atproto-bench-go-raw 2 + 3 + go 1.25.1 4 + 5 + require ( 6 + github.com/fxamacker/cbor/v2 v2.9.0 // indirect 7 + github.com/x448/float16 v0.8.4 // indirect 8 + )

+4

go-raw/go.sum

··· 1 + github.com/fxamacker/cbor/v2 v2.9.0 h1:NpKPmjDBgUfBms6tr6JZkTHtfFGcMKsw3eGcmD/sapM= 2 + github.com/fxamacker/cbor/v2 v2.9.0/go.mod h1:vM4b+DJCtHn+zz7h3FFp/hDAI9WNWCsZj23V5ytsSxQ= 3 + github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM= 4 + github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg=

+334

go-raw/main.go

··· 1 + // atproto firehose benchmarks — go (raw, fxamacker/cbor) 2 + // 3 + // best-effort go implementation: fxamacker/cbor (no reflection for known types), 4 + // hand-rolled sync CAR parser (no CID hash verification), no indigo dependency. 5 + // shows what go can do with better library choices. 6 + package main 7 + 8 + import ( 9 + "bytes" 10 + "encoding/binary" 11 + "fmt" 12 + "os" 13 + "path/filepath" 14 + "sort" 15 + "time" 16 + 17 + "github.com/fxamacker/cbor/v2" 18 + ) 19 + 20 + const ( 21 + warmupPasses = 2 22 + measuredPasses = 5 23 + fixturesDir = "../fixtures" 24 + ) 25 + 26 + // frame header: {op: int, t: string} 27 + type frameHeader struct { 28 + Op int `cbor:"op"` 29 + T string `cbor:"t"` 30 + } 31 + 32 + // commit payload: we only need the blocks field, fxamacker/cbor skips unknown fields 33 + type commitPayload struct { 34 + Blocks []byte `cbor:"blocks"` 35 + } 36 + 37 + type corpusInfo struct { 38 + frames [][]byte 39 + totalBytes int 40 + minFrame int 41 + maxFrame int 42 + } 43 + 44 + type decodeResult struct { 45 + blocks int 46 + errors int 47 + } 48 + 49 + type passResult struct { 50 + frames int 51 + blocks int 52 + errors int 53 + elapsed time.Duration 54 + } 55 + 56 + // fxamacker/cbor decode mode optimized for DAG-CBOR 57 + var decMode cbor.DecMode 58 + 59 + func init() { 60 + var err error 61 + decMode, err = cbor.DecOptions{ 62 + DupMapKey: cbor.DupMapKeyQuiet, 63 + IndefLength: cbor.IndefLengthForbidden, 64 + }.DecMode() 65 + if err != nil { 66 + panic(err) 67 + } 68 + } 69 + 70 + func main() { 71 + fmt.Println("\n=== go (raw) benchmarks ===") 72 + fmt.Println() 73 + 74 + corpus, err := loadCorpus("firehose-frames.bin") 75 + if err != nil { 76 + fmt.Printf("firehose-frames.bin: SKIP (%v)\n", err) 77 + return 78 + } 79 + 80 + fmt.Printf("corpus: %d frames, %d bytes total\n", len(corpus.frames), corpus.totalBytes) 81 + fmt.Printf(" frame sizes: %d..%d bytes\n", corpus.minFrame, corpus.maxFrame) 82 + fmt.Printf(" passes: %d warmup, %d measured\n\n", warmupPasses, measuredPasses) 83 + 84 + // verify first frame 85 + if len(corpus.frames) > 0 { 86 + result := decodeFull(corpus.frames[0]) 87 + fmt.Printf("first frame: blocks=%d errors=%d\n", result.blocks, result.errors) 88 + } 89 + fmt.Println() 90 + 91 + benchDecode(corpus) 92 + 93 + fmt.Println() 94 + } 95 + 96 + // decodeFull decodes a firehose frame: header → payload → CAR → decode every block. 97 + // never returns error — counts failures and continues. 98 + func decodeFull(data []byte) *decodeResult { 99 + result := &decodeResult{} 100 + 101 + // 1. decode frame header + payload using fxamacker/cbor 102 + dec := decMode.NewDecoder(bytes.NewReader(data)) 103 + 104 + var hdr frameHeader 105 + if err := dec.Decode(&hdr); err != nil { 106 + result.errors++ 107 + return result 108 + } 109 + if hdr.T != "#commit" { 110 + return result 111 + } 112 + 113 + var payload commitPayload 114 + if err := dec.Decode(&payload); err != nil { 115 + result.errors++ 116 + return result 117 + } 118 + if len(payload.Blocks) == 0 { 119 + return result 120 + } 121 + 122 + // 2. parse CAR (hand-rolled, sync, no CID hash verification) 123 + carData := payload.Blocks 124 + 125 + // skip CAR header 126 + headerLen, n := binary.Uvarint(carData) 127 + if n <= 0 { 128 + result.errors++ 129 + return result 130 + } 131 + carData = carData[n:] 132 + if uint64(len(carData)) < headerLen { 133 + result.errors++ 134 + return result 135 + } 136 + carData = carData[headerLen:] 137 + 138 + // iterate blocks 139 + for len(carData) > 0 { 140 + sectionLen, n := binary.Uvarint(carData) 141 + if n <= 0 { 142 + result.errors++ 143 + break 144 + } 145 + carData = carData[n:] 146 + if uint64(len(carData)) < sectionLen { 147 + result.errors++ 148 + break 149 + } 150 + section := carData[:sectionLen] 151 + carData = carData[sectionLen:] 152 + 153 + // skip CID to find block data 154 + cidLen, err := skipCid(section) 155 + if err != nil { 156 + result.errors++ 157 + continue 158 + } 159 + blockData := section[cidLen:] 160 + 161 + // 3. decode block as DAG-CBOR 162 + var v interface{} 163 + if err := decMode.Unmarshal(blockData, &v); err != nil { 164 + result.errors++ 165 + continue 166 + } 167 + result.blocks++ 168 + } 169 + 170 + return result 171 + } 172 + 173 + // skipCid returns the byte length of a CID at the start of data. 174 + func skipCid(data []byte) (int, error) { 175 + if len(data) == 0 { 176 + return 0, fmt.Errorf("empty CID") 177 + } 178 + 179 + // CIDv0: multihash directly (0x12 = sha256, 0x20 = 32 byte digest = 34 total) 180 + if data[0] == 0x12 { 181 + if len(data) < 34 { 182 + return 0, fmt.Errorf("truncated CIDv0") 183 + } 184 + return 34, nil 185 + } 186 + 187 + // CIDv1: version varint + codec varint + multihash(code varint + digest_len varint + digest) 188 + pos := 0 189 + 190 + // version 191 + _, n := binary.Uvarint(data[pos:]) 192 + if n <= 0 { 193 + return 0, fmt.Errorf("bad version varint") 194 + } 195 + pos += n 196 + 197 + // codec 198 + _, n = binary.Uvarint(data[pos:]) 199 + if n <= 0 { 200 + return 0, fmt.Errorf("bad codec varint") 201 + } 202 + pos += n 203 + 204 + // multihash code 205 + _, n = binary.Uvarint(data[pos:]) 206 + if n <= 0 { 207 + return 0, fmt.Errorf("bad mh code varint") 208 + } 209 + pos += n 210 + 211 + // multihash digest length 212 + digestLen, n := binary.Uvarint(data[pos:]) 213 + if n <= 0 { 214 + return 0, fmt.Errorf("bad mh digest len varint") 215 + } 216 + pos += n 217 + pos += int(digestLen) 218 + 219 + if pos > len(data) { 220 + return 0, fmt.Errorf("truncated multihash") 221 + } 222 + return pos, nil 223 + } 224 + 225 + // --- benchmark harness --- 226 + 227 + func benchDecode(corpus *corpusInfo) { 228 + for i := 0; i < warmupPasses; i++ { 229 + for _, frame := range corpus.frames { 230 + _ = decodeFull(frame) 231 + } 232 + } 233 + 234 + passResults := make([]passResult, measuredPasses) 235 + 236 + for i := 0; i < measuredPasses; i++ { 237 + var passBlocks, passErrors int 238 + start := time.Now() 239 + for _, frame := range corpus.frames { 240 + result := decodeFull(frame) 241 + passBlocks += result.blocks 242 + passErrors += result.errors 243 + } 244 + elapsed := time.Since(start) 245 + passResults[i] = passResult{ 246 + frames: len(corpus.frames), 247 + blocks: passBlocks, 248 + errors: passErrors, 249 + elapsed: elapsed, 250 + } 251 + } 252 + 253 + reportResult("decode", corpus, passResults) 254 + } 255 + 256 + func reportResult(name string, corpus *corpusInfo, passResults []passResult) { 257 + fpsValues := make([]float64, len(passResults)) 258 + totalFrames := 0 259 + totalBlocks := 0 260 + totalErrors := 0 261 + var totalElapsed float64 262 + 263 + for i, r := range passResults { 264 + elapsedS := r.elapsed.Seconds() 265 + fpsValues[i] = float64(r.frames) / elapsedS 266 + totalFrames += r.frames 267 + totalBlocks += r.blocks 268 + totalErrors += r.errors 269 + totalElapsed += elapsedS 270 + } 271 + 272 + sort.Float64s(fpsValues) 273 + 274 + totalBytes := float64(corpus.totalBytes) * float64(measuredPasses) 275 + throughputMb := totalBytes / (1024 * 1024) / totalElapsed 276 + blocksPerFrame := float64(totalBlocks) / float64(totalFrames) 277 + 278 + minFps := fpsValues[0] 279 + medianFps := fpsValues[measuredPasses/2] 280 + maxFps := fpsValues[measuredPasses-1] 281 + 282 + fmt.Printf("%-14s %10.0f frames/sec %8.1f MB/s blocks=%d (%.2f/frame) errors=%d\n", 283 + name, medianFps, throughputMb, totalBlocks, blocksPerFrame, totalErrors) 284 + fmt.Printf("%-14s variance: min=%.0f median=%.0f max=%.0f frames/sec\n", 285 + "", minFps, medianFps, maxFps) 286 + } 287 + 288 + func loadCorpus(name string) (*corpusInfo, error) { 289 + path := filepath.Join(fixturesDir, name) 290 + data, err := os.ReadFile(path) 291 + if err != nil { 292 + fmt.Printf("cannot open %s: %v\n", path, err) 293 + fmt.Println("run `just capture` first to generate fixtures") 294 + return nil, err 295 + } 296 + 297 + if len(data) < 4 { 298 + return nil, fmt.Errorf("corpus file too small") 299 + } 300 + 301 + frameCount := int(binary.BigEndian.Uint32(data[0:4])) 302 + frames := make([][]byte, 0, frameCount) 303 + pos := 4 304 + totalBytes := 0 305 + minFrame := int(^uint(0) >> 1) 306 + maxFrame := 0 307 + 308 + for i := 0; i < frameCount; i++ { 309 + if pos+4 > len(data) { 310 + return nil, fmt.Errorf("truncated corpus") 311 + } 312 + frameLen := int(binary.BigEndian.Uint32(data[pos : pos+4])) 313 + pos += 4 314 + if pos+frameLen > len(data) { 315 + return nil, fmt.Errorf("truncated corpus") 316 + } 317 + frames = append(frames, data[pos:pos+frameLen]) 318 + pos += frameLen 319 + totalBytes += frameLen 320 + if frameLen < minFrame { 321 + minFrame = frameLen 322 + } 323 + if frameLen > maxFrame { 324 + maxFrame = frameLen 325 + } 326 + } 327 + 328 + return &corpusInfo{ 329 + frames: frames, 330 + totalBytes: totalBytes, 331 + minFrame: minFrame, 332 + maxFrame: maxFrame, 333 + }, nil 334 + }

+5

justfile

··· 21 21 @echo "--------------------------------------------" 22 22 cd python && uv run bench.py 23 23 @echo "--------------------------------------------" 24 + cd go-raw && go run -buildvcs=false . 25 + @echo "--------------------------------------------" 24 26 cd go && go run . 25 27 @echo "============================================" 26 28 ··· 36 38 37 39 bench-go: _ensure-fixtures 38 40 cd go && go run . 41 + 42 + bench-go-raw: _ensure-fixtures 43 + cd go-raw && go run -buildvcs=false . 39 44 40 45 bench-python: _ensure-fixtures 41 46 cd python && uv run bench.py

+2 -2

zig/build.zig.zon

··· 5 5 .minimum_zig_version = "0.15.0", 6 6 .dependencies = .{ 7 7 .zat = .{ 8 - .url = "https://tangled.sh/zat.dev/zat/archive/v0.1.9", 9 - .hash = "zat-0.1.9-5PuC7tL5AwAgHHJXdOHTCy373NtwQW7cE2nfB7rq4yx_", 8 + .url = "https://tangled.sh/zat.dev/zat/archive/v0.2.1", 9 + .hash = "zat-0.2.0-5PuC7igtBAC6Pl01Kyqn0c-1iBFuwYAlxj9Nvf3515Su", 10 10 }, 11 11 }, 12 12 .paths = .{

+27 -14

zig/src/bench.zig

··· 45 45 { 46 46 var arena = std.heap.ArenaAllocator.init(allocator); 47 47 defer arena.deinit(); 48 - const result = decodeFullFrame(arena.allocator(), corpus.frames[0]); 48 + const result = decodeFullFrame(arena.allocator(), corpus.frames[0], .{}); 49 49 std.debug.print("first frame: blocks={d} errors={d}\n\n", .{ 50 50 result.blocks, result.errors, 51 51 }); 52 52 } 53 53 54 - benchDecodeFrame(allocator, corpus) catch |err| { 54 + // with CID hash verification (production-correct, comparable to go/indigo) 55 + benchDecodeFrame(allocator, corpus, .{ .verify_cid_hashes = true }) catch |err| { 56 + std.debug.print("decode+verify (reuse): SKIP ({s})\n", .{@errorName(err)}); 57 + }; 58 + 59 + benchDecodeFrame(allocator, corpus, .{ .verify_cid_hashes = false }) catch |err| { 55 60 std.debug.print("decode (reuse): SKIP ({s})\n", .{@errorName(err)}); 56 61 }; 57 62 58 - benchDecodeFrameAlloc(allocator, corpus) catch |err| { 63 + benchDecodeFrameAlloc(allocator, corpus, .{ .verify_cid_hashes = false }) catch |err| { 59 64 std.debug.print("decode (alloc): SKIP ({s})\n", .{@errorName(err)}); 60 65 }; 61 66 62 67 std.debug.print("\n", .{}); 63 68 } 64 69 70 + const DecodeOptions = struct { 71 + verify_cid_hashes: bool = true, 72 + }; 73 + 65 74 /// full decode of one frame: header → payload → CAR → decode every block. 66 75 /// returns block count and error count — does not abort on per-block failures. 67 - fn decodeFullFrame(allocator: Allocator, data: []const u8) struct { blocks: usize, errors: usize } { 76 + fn decodeFullFrame(allocator: Allocator, data: []const u8, options: DecodeOptions) struct { blocks: usize, errors: usize } { 68 77 const cbor = zat.cbor; 69 78 const car = zat.car; 70 79 ··· 82 91 return .{ .blocks = 0, .errors = 1 }; 83 92 }; 84 93 85 - // 3. parse CAR from blocks field 94 + // 3. parse CAR from blocks field (with or without CID hash verification) 86 95 const blocks_bytes = payload.getBytes("blocks") orelse { 87 96 return .{ .blocks = 0, .errors = 0 }; 88 97 }; 89 - const parsed_car = car.read(allocator, blocks_bytes) catch { 98 + const parsed_car = car.readWithOptions(allocator, blocks_bytes, .{ 99 + .verify_block_hashes = options.verify_cid_hashes, 100 + }) catch { 90 101 return .{ .blocks = 0, .errors = 1 }; 91 102 }; 92 103 ··· 103 114 } 104 115 105 116 /// arena reuse: production allocation pattern — one arena, reset per frame. 106 - fn benchDecodeFrame(allocator: Allocator, corpus: CorpusInfo) !void { 117 + fn benchDecodeFrame(allocator: Allocator, corpus: CorpusInfo, options: DecodeOptions) !void { 118 + const label = if (options.verify_cid_hashes) "decode+v (reuse)" else "decode (reuse)"; 107 119 var arena = std.heap.ArenaAllocator.init(allocator); 108 120 defer arena.deinit(); 109 121 110 122 for (0..warmup_passes) |_| { 111 123 for (corpus.frames) |frame| { 112 124 _ = arena.reset(.retain_capacity); 113 - _ = decodeFullFrame(arena.allocator(), frame); 125 + _ = decodeFullFrame(arena.allocator(), frame, options); 114 126 } 115 127 } 116 128 ··· 124 136 var timer = try std.time.Timer.start(); 125 137 for (corpus.frames) |frame| { 126 138 _ = arena.reset(.retain_capacity); 127 - const result = decodeFullFrame(arena.allocator(), frame); 139 + const result = decodeFullFrame(arena.allocator(), frame, options); 128 140 pass_blocks += result.blocks; 129 141 pass_errors += result.errors; 130 142 } ··· 139 151 total_errors += pass_errors; 140 152 } 141 153 142 - reportResult("decode (reuse)", corpus, &pass_results, total_blocks, total_errors); 154 + reportResult(label, corpus, &pass_results, total_blocks, total_errors); 143 155 } 144 156 145 157 /// arena per-frame: fair cross-language comparison — fresh alloc/free per frame. 146 - fn benchDecodeFrameAlloc(allocator: Allocator, corpus: CorpusInfo) !void { 158 + fn benchDecodeFrameAlloc(allocator: Allocator, corpus: CorpusInfo, options: DecodeOptions) !void { 159 + const label = if (options.verify_cid_hashes) "decode+v (alloc)" else "decode (alloc)"; 147 160 for (0..warmup_passes) |_| { 148 161 for (corpus.frames) |frame| { 149 162 var arena = std.heap.ArenaAllocator.init(allocator); 150 163 defer arena.deinit(); 151 - _ = decodeFullFrame(arena.allocator(), frame); 164 + _ = decodeFullFrame(arena.allocator(), frame, options); 152 165 } 153 166 } 154 167 ··· 163 176 for (corpus.frames) |frame| { 164 177 var arena = std.heap.ArenaAllocator.init(allocator); 165 178 defer arena.deinit(); 166 - const result = decodeFullFrame(arena.allocator(), frame); 179 + const result = decodeFullFrame(arena.allocator(), frame, options); 167 180 pass_blocks += result.blocks; 168 181 pass_errors += result.errors; 169 182 } ··· 178 191 total_errors += pass_errors; 179 192 } 180 193 181 - reportResult("decode (alloc)", corpus, &pass_results, total_blocks, total_errors); 194 + reportResult(label, corpus, &pass_results, total_blocks, total_errors); 182 195 } 183 196 184 197 fn reportResult(