this repo has no description
Zig 40.9%
Rust 31.5%
Go 16.6%
Python 8.6%
Just 2.3%
Other 0.1%
21 1 0

Clone this repository

https://tangled.org/zzstoatzz.io/atproto-bench https://tangled.org/did:plc:xbtmt2zjwlrfegqvch7fboei/atproto-bench
git@tangled.org:zzstoatzz.io/atproto-bench git@tangled.org:did:plc:xbtmt2zjwlrfegqvch7fboei/atproto-bench

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

atproto-bench#

SDK-level benchmarks for AT Protocol relay infrastructure.

measures the two CPU-bound bottlenecks a relay hits on every incoming commit: decode (CBOR + CAR + DAG-CBOR + CID verification) and signature verification (ECDSA). same corpora across all SDKs, verified work parity via block counts, error counts, and entry counts.

what this measures#

each benchmark does the same work per frame:

  1. decode CBOR frame header
  2. decode CBOR payload (typed commit)
  3. parse CAR from the blocks field
  4. decode every CAR block as DAG-CBOR

block counts, error counts, and variance (min/median/max) are reported so you can verify parity across SDKs.

results#

3,298 firehose frames (16.2 MB), 5 measured passes, macOS arm64 (M3 Max)

production-correct (with CID hash verification)#

three SDKs verify CID hashes (SHA-256 per block): zat, rsky, and indigo. this is the correct behavior for untrusted network data — it proves block content matches the content identifier.

SDK frames/sec (median) MB/s blocks/frame errors
zig (zat, arena reuse) 290,461 1,408.5 9.98 0
rust (rsky stack) 38,905 186.5 9.98 0
go (indigo) 15,074 73.3 9.98 0

decode-only (no CID hash verification)#

the remaining SDKs skip CID verification. these numbers show decode throughput in isolation — useful for comparing SDK architecture, but not directly comparable to the verified numbers above.

SDK frames/sec (median) MB/s blocks/frame errors
zig (zat, arena reuse) 529,424 2,638.0 9.98 0
zig (zat, alloc per frame) 521,925 2,529.7 9.98 0
rust (raw, arena reuse) 226,146 1,097.4 9.98 0
rust (raw, alloc per frame) 200,763 1,012.9 9.98 0
rust (jacquard) 56,523 275.8 9.98 0
go (raw, fxamacker/cbor) 40,137 187.0 9.98 0
python (atproto) 33,842 163.0 9.98 0
go (indigo) 15,587 75.6 9.98 0

note: indigo appears in both tables. its number is the same because it always verifies — there is no option to disable it in go-car v1.

run-to-run variance is ~30-40%. compare ratios within a single just bench run, not across runs.

CID verification#

a CID (Content IDentifier) contains a hash digest of the block's content. verifying it means SHA-256 hashing each block and comparing against the digest in the CID. this proves the block wasn't corrupted or tampered with in transit.

SDK verifies CID hashes? notes
zig (zat) yes (v0.2.1+) car.read() verifies by default; readWithOptions(.{ .verify_block_hashes = false }) to skip
rust (rsky, rs-car-sync) yes CarReader::new(&mut cursor, true) — second arg enables verification
go (indigo, go-car v1) yes (always) no option to disable in v1
rust (jacquard, iroh-car) no not implemented
rust (raw) no not implemented
go (raw) no not implemented
python (libipld) no not implemented

what each SDK does#

every SDK takes the same raw binary frame and decodes all the way through to per-block DAG-CBOR:

SDK decode path
zig cbor.decode header → cbor.decodeAll payload → car.read (+ SHA-256 verify) → cbor.decodeAll per block
rust (rsky stack) ciborium header → serde_ipld_dagcbor payload → rs-car-sync CAR (+ SHA-256 verify) → serde_ipld_dagcbor per block
rust (raw) minicbor::Decoder header → payload → hand-rolled sync CAR → minicbor + bumpalo per block
rust (jacquard) SubscribeReposMessage::decode_framed → typed Commit, jacquard_repo::car::parse_car_bytes → blocks, serde_ipld_dagcbor per block
go (raw) fxamacker/cbor struct decode → hand-rolled sync CAR → fxamacker/cbor Unmarshal per block
go (indigo) evt.Deserialize → typed RepoCommit via code-gen CBOR → car.NewBlockReader (+ SHA-256 verify) → cbornode.DecodeInto per block
python Frame.from_bytes + parse_subscribe_repos_messageCAR.from_bytes (libipld decodes all blocks internally)

correctness parity#

we traced the full decode path of every SDK to verify that no SDK is winning by skipping correctness work.

what zat, rsky, and indigo all do per frame:

  • decode full CBOR payload (all commit fields — repo, rev, ops, timestamp, etc.)
  • parse CAR header and all block sections
  • parse CID structure (version, codec, multihash) for each block
  • SHA-256 hash each block and compare against CID digest
  • decode every block as DAG-CBOR

what zat and indigo do that isn't obvious:

  • enforce size limits (2MB max on blocks field, max block count) — zat matches indigo's limits as of v0.2.2

what none of them do:

  • DAG-CBOR deterministic encoding validation (sorted keys, minimal integers) — indigo's refmt doesn't check this either
  • signature verification — separate from decode, not measured here
  • MST validation — separate from decode, not measured here

there are no correctness differences between the verified decode paths. the performance gaps are entirely implementation cost.

where the ~15x comes from#

we traced indigo's decode path at the instruction level. the cost compounds from several architectural differences:

factor indigo zat approx cost
CBOR decode refmt: token pump → reflection → reflect.SetMapIndex per entry hand-written, direct dispatch ~3-4x
string/byte handling Go string heap allocation per value (repo, rev, path, action, per-block keys) zero-copy slices into input buffer ~2-3x
memory management per-object GC'd heap allocation; every map, array, int is boxed arena allocator, 24-byte Value union ~2-3x
CAR block reads make([]byte, section_len) + copy per block; CID parsed twice (once to read, once to verify) reads directly from input slice; CID parsed once ~1.5x
blocks field make([]uint8, len) + io.ReadFull copies entire CAR payload slices into input buffer ~1.2x

these factors multiply. refmt's reflection overhead × per-value heap allocation × GC pressure × byte copying = ~15x on this workload.

note: indigo's cbor-gen (code-generated unmarshal for the commit struct) is fast — the bottleneck is cbornode.DecodeInto (refmt/reflection) for the per-block DAG-CBOR decode, which runs ~10 times per frame.

fairness notes#

  • CID verification: zat, rsky, and indigo all verify block hashes. this is ~2x overhead for zat (290k vs 595k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use
  • rust (rsky stack) — by Rudy Fraser (BlackSky) — is a full AT Protocol implementation in Rust: PDS, relay, feed generator, labeler, plus library crates (rsky-repo, rsky-crypto, rsky-identity). our decode bench uses the same crates as rsky's relay: ciborium for CBOR header, serde_ipld_dagcbor for DAG-CBOR body/blocks, rs-car-sync for CAR with CID verification, and RustCrypto k256/p256 for signatures
  • zig and rust (raw) both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern
  • rust (jacquard) — by @nonbinary.computer — is a client-focused AT Protocol SDK for Rust with zero-copy deserialization, generated API bindings, MST/CAR/identity support, and OAuth. it pays for serde-based owned deserialization (String, BTreeMap), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation
  • go (raw) uses fxamacker/cbor (no reflection for known struct types), a hand-rolled sync CAR parser (no CID hash verification), and no indigo dependency. GC pressure remains the fundamental constraint — Go's experimental arena package (GOEXPERIMENT=arenas) is on hold and not recommended for production
  • go (indigo) — bluesky's own production relay — uses code-generated CBOR unmarshal (no reflection at the frame level) but pays for go-car's per-block CID hash verification and cbornode's reflection-based DAG-CBOR decode via the unmaintained refmt library
  • python is faster than jacquard despite being "Python" — its hot path is libipld (Rust via PyO3), which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call
  • error handling: all SDKs use infallible decode functions that never abort on failure — errors are counted and the frame is skipped
  • capture coupling: the corpus capture tool uses zat's CBOR decoder for the commit-with-ops header peek. this is standard CBOR parsing (not zat's typed firehose decoder), but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus

signature verification#

a separate benchmark measures the full signature verification pipeline that relays perform on every incoming commit. this is CPU-bound ECDSA work that compounds with scale (~500-1000 verifies/sec on the live network, much higher during backfill).

what it measures#

per entry, both SDKs do identical work:

  1. CBOR decode the signed commit (has sig field)
  2. strip the sig field, re-encode as unsigned CBOR (deterministic DAG-CBOR)
  3. SHA-256 hash the unsigned bytes
  4. ECDSA verify the hash against the signature using the account's public key
  5. dispatch by curve type: P-256 or secp256k1

both enforce low-S normalization on both curves.

three tiers#

  • full pipeline: CBOR decode → strip sig → re-encode → SHA-256 → ECDSA verify (what a relay actually does)
  • crypto-only: SHA-256 → ECDSA verify with pre-computed unsigned bytes (isolates crypto cost from CBOR overhead)
  • preparsed-key: SHA-256 → ECDSA verify with pre-parsed public keys (isolates pure ECDSA math from SEC1 decompression)

results#

3,072 signed commits (all secp256k1), 5 measured passes, macOS arm64 (M3 Max)

SDK variant verifies/sec (median) entries P-256 secp256k1 errors
rust (rsky stack) full pipeline 18,974 3,072 0 3,072 0
rust (rsky stack) crypto-only 19,310 3,072 0 3,072 0
rust (rsky stack) preparsed-key 20,631 3,072 0 3,072 0
zig (zat + k256) full pipeline 15,385 3,072 0 3,072 0
zig (zat + k256) crypto-only 16,338 3,072 0 3,072 0
zig (zat + k256) preparsed-key 19,148 3,072 0 3,072 0
go (indigo) full pipeline 14,768 3,072 0 3,072 0
go (indigo) crypto-only 15,399 3,072 0 3,072 0
go (indigo) preparsed-key 18,227 3,072 0 3,072 0

all three are competitive. all use optimized secp256k1 with GLV endomorphism — RustCrypto k256 (complete addition formulas, pure Rust), zat's k256 ports libsecp256k1's field arithmetic, indigo uses decred/dcrd.

the crypto-only vs full-pipeline numbers being nearly identical confirms ECDSA is the bottleneck, not CBOR re-encoding overhead. the preparsed-key tier shows key parsing is a small but measurable cost — relevant for relay implementations that cache public keys per-DID.

sig-verify corpus format#

[u32 BE entry_count]
per entry:
  [u8 curve_type]                         // 0 = P-256, 1 = secp256k1
  [u16 BE signed_len][signed_bytes...]    // signed commit CBOR (with sig field)
  [u16 BE pubkey_len][pubkey_bytes...]    // compressed public key (33 bytes)

captured by connecting to the firehose, extracting signed commit blocks from CAR data, and resolving each DID via PLC directory to get the signing key. entries that fail verification are dropped during capture.

corpus format#

the fixture file (fixtures/firehose-frames.bin) uses a simple length-prefixed binary format:

[u32 BE frame_count]
[u32 BE frame_1_len][frame_1 bytes]
[u32 BE frame_2_len][frame_2 bytes]
...

frames are captured from ~10 seconds of live firehose traffic, filtered to commits with ops using a minimal CBOR header peek.

when this matters#

for live firehose consumption: usually no. at ~500-1000 events/sec (full bluesky network), any of these SDKs handle the load.

where it matters:

  • backfill / replay — processing months of historical data. decode + verify throughput determines catch-up speed.
  • relays at scale — routing events to many downstream consumers. every microsecond of decode + verify overhead compounds across fan-out.
  • memory — smaller value types mean less memory per in-flight frame.

the overall picture: decode throughput varies ~19x across the production-correct SDKs (dominated by CBOR/memory architecture choices). signature verification is competitive across all three — zig, rust, and go all land within ~40% of each other using optimized secp256k1 libraries. for relay workloads, decode is the differentiator; sig-verify is table stakes.

SDKs tested#

lang SDK version CBOR engine CAR engine
zig zat v0.2.2 + k256 v0.0.4 hand-rolled hand-rolled (+ SHA-256 CID verify, size limits)
rust rsky stack ciborium (header) + serde_ipld_dagcbor (body) rs-car-sync (+ SHA-256 CID verify)
rust raw (minicbor + bumpalo) minicbor (zero-copy) hand-rolled (sync)
rust jacquard 0.9 ciborium (header) + serde_ipld_dagcbor (body) iroh-car (async)
go raw (fxamacker/cbor) fxamacker/cbor hand-rolled (sync, no CID verify)
go indigo latest cbor-gen (code-generated) go-car/v2 (+ SHA-256 CID verify)
python atproto 0.0.65 libipld (Rust via PyO3) libipld

trust chain verification#

a third benchmark measures the full end-to-end AT Protocol trust chain: given a handle, resolve identity, fetch the repo, and cryptographically verify everything. this exercises the complete pipeline that any independent verifier needs.

what it measures#

per handle, all three SDKs do the same work:

  1. resolve handle → DID (HTTP well-known or DNS TXT)
  2. resolve DID → DID document → extract signing key + PDS endpoint
  3. fetch full repo CAR from PDS (com.atproto.sync.getRepo)
  4. parse CAR with CID verification (SHA-256 per block)
  5. ECDSA verify the commit signature against the signing key
  6. walk the MST to count all records
  7. rebuild MST and verify root CID matches commit (zig and go only)

results#

pfrazee.com — 192,144 records, 243,470 blocks, 70.6 MB CAR, macOS arm64 (M3 Max)

trust chain compute breakdown
SDK CAR parse sig verify MST walk MST rebuild compute total network
zig (zat) 81.6ms 0.6ms 45.5ms 172.6ms 300.4ms 9.8s
go (indigo) 403.8ms 0.4ms 5.8ms 0.0ms 410.0ms 20.8s
rust (rsky stack) 301.0ms 0.2ms 120.9ms N/A 422.1ms 8.7s

the rust verify uses the same low-level crates that rsky (Rudy Fraser / BlackSky) uses internally (k256, p256, serde_ipld_dagcbor, sha2) but not rsky's higher-level abstractions (rsky-repo, rsky-identity, rsky-crypto). no Rust equivalent of indigo's all-in-one LoadRepoFromCAR exists yet — rsky provides the library crates (MST, CAR, identity, crypto) but the end-to-end verify pipeline is assembled manually. jacquard also has MST/CAR/identity support and could serve as the basis for a higher-level verify path.

compute time is dominated by CAR parsing (SHA-256 verification of 243k blocks) and MST operations. signature verification is sub-millisecond for all three — single ECDSA verify is trivial compared to the bulk data work.

zig's CAR parse advantage carries over from the decode benchmarks (arena allocation + zero-copy). go's MST walk is fastest because indigo's MST.Walk() operates on an already-loaded in-memory tree — nodes are decoded from CBOR once on first access and cached as Go structs, so walking is pure pointer chasing. zig and rust decode MST nodes from raw CBOR on each visit. rust skips MST rebuild (no equivalent crate exists in either rsky or jacquard yet) but the trust chain is still fully proven by signature + walk.

network time varies by run (PDS response time, TLS handshake, geographic distance). compare compute columns, not totals.

just verify pfrazee.com          # run all three
just chart pfrazee.com           # run + generate SVG charts

usage#

# decode benchmarks
just capture       # capture ~10s of firehose traffic
just bench         # run all decode benchmarks
just bench-zig     # run a single language
just bench-rust-rsky

# sig verify benchmarks
just capture-sigs  # capture signed commits + resolve public keys (~10s + DID resolution)
just bench-sigs    # run all sig verify benchmarks (zig + rust + go)
just bench-sigs-zig
just bench-sigs-rust
just bench-sigs-go

# trust chain verification
just verify pfrazee.com    # run all three implementations
just chart pfrazee.com     # run + generate SVG charts to docs/

methodology#

  • just capture connects to the live firehose for ~10 seconds, filters for commits with ops via CBOR header peek (uses zat's CBOR decoder — see fairness notes), writes a length-prefixed corpus
  • each benchmark decodes every frame fully: header → payload → CAR → decode every block as DAG-CBOR
  • zat and indigo additionally SHA-256 verify every block CID
  • 2 warmup passes, 5 measured passes over the full corpus
  • zig builds with -Doptimize=ReleaseFast, rust with opt-level=3 lto=true
  • go and python use their standard release toolchains
  • reported numbers: median frames/sec across passes, plus min/max for variance. block counts and error counts verify work parity across SDKs
  • run-to-run variance is significant (~30-40% between separate invocations due to system load, thermal state, etc.). ratios between SDKs should be compared within a single just bench run, not across runs