docs: devlog 004 — the sig-verify saga

atproto utils for zig zat.dev

atproto sdk zig

k256 5×52-bit field, Fermat scalar inversion, three-way benchmark
with blacksky's rsky stack. changelog backfill for 0.2.1 and 0.2.2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zzstoatzz.io 2 weeks ago 1d622ebc 453c331d

publish-docs.yml

success 12s

+97

2 changed files

expand all

CHANGELOG.md

devlog

004-sig-verify.md

+14

CHANGELOG.md

··· 1 # changelog 2 3 ## 0.2.0 4 5 - **feat**: end-to-end repo verification — `verifyRepo(allocator, identifier)` exercises the full AT Protocol trust chain: handle → DID → DID document → signing key → fetch repo CAR → verify commit signature → walk MST → rebuild tree → CID match

··· 1 # changelog 2 3 + ## 0.2.3 4 + 5 + - **docs**: devlog 004 — the sig-verify saga (k256 5×52-bit field, Fermat scalar inversion, three-way bench with rsky) 6 + - changelog backfill for 0.2.1 and 0.2.2 7 + 8 + ## 0.2.2 9 + 10 + - **feat**: CAR parser enforces size limits — 2MB max on blocks field, max block count. matches indigo's limits for production parity. 11 + 12 + ## 0.2.1 13 + 14 + - **feat**: CID hash verification in CAR parser — `car.read()` SHA-256 hashes each block and compares against the CID digest. proves block content wasn't corrupted or tampered with. `readWithOptions(.{ .verify_block_hashes = false })` to skip for trusted local data. 15 + - **fix**: remove pfrazee.com from default test suite (network-dependent) 16 + 17 ## 0.2.0 18 19 - **feat**: end-to-end repo verification — `verifyRepo(allocator, identifier)` exercises the full AT Protocol trust chain: handle → DID → DID document → signing key → fetch repo CAR → verify commit signature → walk MST → rebuild tree → CID match

+83

devlog/004-sig-verify.md

···

··· 1 + # the sig-verify saga 2 + 3 + the previous devlogs covered decode throughput — zat's CBOR/CAR pipeline vs Go's indigo. this one is about the other half of a relay's hot path: ECDSA signature verification. 4 + 5 + every commit on the AT Protocol network is signed. a relay verifying incoming commits needs to: decode the signed commit, strip the `sig` field, re-encode as deterministic DAG-CBOR, SHA-256 hash the result, then verify the ECDSA signature against the account's public key. the corpus is ~100% secp256k1 (the network migrated from P-256 early on). 6 + 7 + ## starting point 8 + 9 + zig's stdlib includes `std.crypto.ecc.Secp256k1` — projective coordinates, constant-time scalar arithmetic, no endomorphism optimization. first attempt: just call `mulPublic` and check. 10 + 11 + ~2,800 verifies/sec. Go's indigo (using [decred/dcrd](https://github.com/decred/dcrd/tree/master/dcrec/secp256k1)) was doing ~15,000. a 5.4x gap. 12 + 13 + ## k256 — three optimizations 14 + 15 + we built [k256](https://tangled.sh/@zzstoatzz.io/k256), an optimized secp256k1 verifier on top of zig's stdlib scalar and field primitives. 16 + 17 + **1. endomorphism (GLV decomposition)** 18 + 19 + secp256k1 has an efficient endomorphism: multiplying the x-coordinate by a constant beta is equivalent to multiplying the point by lambda (mod n). this lets you decompose a scalar multiplication `k*P` into `k1*P + k2*beta(P)` where k1 and k2 are half-width (~128-bit) scalars. half the doublings for the dominant operation. 20 + 21 + **2. 4-way Shamir (interleaved multi-scalar multiplication)** 22 + 23 + ECDSA verification computes `u1*G + u2*Q` where G is the generator and Q is the public key. with endomorphism applied to both, that's four half-width scalar multiplications. instead of computing them separately and adding, Shamir's trick interleaves them in a single pass through the bits — one shared doubling chain, adding precomputed combinations at each step. 24 + 25 + **3. projective comparison** 26 + 27 + the final check in ECDSA is whether the x-coordinate of the computed point equals `r` (mod n). naive: convert to affine (expensive field inversion), compare. better: compare in projective coordinates — `r * Z == X (mod p)`. this avoids the inversion entirely, which matters when it's on the hot path of every verify. 28 + 29 + important subtlety: zig's stdlib uses projective coordinates (x = X/Z), not Jacobian (x = X/Z²). got this wrong initially by assuming Jacobian — reading the stdlib source (`std.crypto.ecc.Secp256k1`) caught it. 30 + 31 + these three changes brought us to ~9,800 v/s (k256 v0.0.2). Go was still at ~15,000. 32 + 33 + ## 5x52-bit field rewrite 34 + 35 + profiling showed field multiplication was the bottleneck. zig's stdlib represents secp256k1 field elements as 10 limbs of 26 bits each. multiplication requires 100 partial products (10×10). libsecp256k1 (Bitcoin Core's C library) uses 5 limbs of 52 bits — only 25 partial products, leveraging u128 multiply-accumulate. 36 + 37 + ported the `field_5x52_int128_impl.h` routines to zig. the representation change cuts multiply cost by ~4x (25 vs 100 products), and the wider limbs mean fewer carries and less bookkeeping. 38 + 39 + this got us to ~15,800 v/s — roughly even with Go (k256 v0.0.3). 40 + 41 + ## Fermat scalar inversion 42 + 43 + the remaining gap was in scalar inversion. zig's stdlib uses the divstep algorithm (binary extended Euclidean) — 769 iterations of conditional moves in a serialized dependency chain. constant-time, designed for signing safety where the scalar is secret. 44 + 45 + for verification, all inputs are public. Fermat's little theorem gives us `s^{-1} = s^{n-2} mod n` via an addition chain: 253 squarings + 40 multiplications. ported the chain from [secp256k1-voi](https://gitlab.com/yawning/secp256k1-voi) (generated by [addchain](https://github.com/mmcloughlin/addchain) v0.4.0). 46 + 47 + this brought preparsed-key to ~19,100 v/s (k256 v0.0.4). 48 + 49 + ## adding blacksky's rsky to the bench 50 + 51 + before writing any of this up, we wanted to check: what if someone else is doing this better? 52 + 53 + [rsky](https://github.com/blacksky-algorithms/rsky) is blacksky's full AT Protocol implementation in Rust — the most complete outside of bluesky's own Go stack. their relay uses RustCrypto's [k256](https://crates.io/crates/k256) (pure Rust, complete addition formulas, GLV endomorphism) for secp256k1, and ciborium + serde_ipld_dagcbor + rs-car-sync for the decode pipeline. 54 + 55 + we added both decode and sig-verify benchmarks using their exact crate stack. 56 + 57 + ### results (single run, M3 Max) 58 + 59 + **decode with CID verification (frames/sec)** 60 + 61 + | SDK | frames/sec | MB/s | 62 + |---|---:|---:| 63 + | zig (zat) | 290,461 | 1,409 | 64 + | rust (rsky stack) | 38,905 | 187 | 65 + | go (indigo) | 15,074 | 73 | 66 + 67 + **sig-verify — preparsed-key (verifies/sec)** 68 + 69 + | SDK | verifies/sec | 70 + |---|---:| 71 + | rust (rsky stack) | 20,748 | 72 + | zig (k256) | 19,110 | 73 + | go (dcrd) | 17,934 | 74 + 75 + RustCrypto's k256 has a slight edge on pure crypto (~9% over zig). the full pipeline gap is larger because serde_ipld_dagcbor is efficient at the CBOR re-encoding step. on decode, zig is 7.5x faster than rsky and 19x faster than Go. 76 + 77 + the overall picture: all three SDKs are competitive on signature verification — it's table stakes. decode throughput is where the architectural differences (zero-copy CBOR, arena allocation, hand-rolled CAR parsing) actually compound. 78 + 79 + ## what's in this release 80 + 81 + - **k256 v0.0.4**: 5×52-bit field, Fermat scalar inversion, zig zen cleanup (Fe26→Fe, AffinePoint26→AffinePoint, removed redundant constants) 82 + - **zat v0.2.3**: changelog backfill for 0.2.1 (CID verification) and 0.2.2 (CAR size limits), this devlog 83 + - **atproto-bench**: rsky stack added for both decode and sig-verify, three-way comparison