atproto-bench#

SDK-level benchmarks for AT Protocol relay infrastructure.

measures the two CPU-bound bottlenecks a relay hits on every incoming commit: decode (CBOR + CAR + DAG-CBOR + CID verification) and signature verification (ECDSA). same corpora across all SDKs, verified work parity via block counts, error counts, and entry counts.

what this measures#

each benchmark does the same work per frame:

decode CBOR frame header
decode CBOR payload (typed commit)
parse CAR from the blocks field
decode every CAR block as DAG-CBOR

block counts, error counts, and variance (min/median/max) are reported so you can verify parity across SDKs.

results#

3,298 firehose frames (16.2 MB), 5 measured passes, macOS arm64 (M3 Max)

production-correct (with CID hash verification)#

three SDKs verify CID hashes (SHA-256 per block): zat, rsky, and indigo. this is the correct behavior for untrusted network data — it proves block content matches the content identifier.

SDK	frames/sec (median)	MB/s	blocks/frame
zig (zat, arena reuse)	290,461	1,408.5	9.98
rust (rsky stack)	38,905	186.5	9.98
go (indigo)	15,074	73.3	9.98

decode-only (no CID hash verification)#

the remaining SDKs skip CID verification. these numbers show decode throughput in isolation — useful for comparing SDK architecture, but not directly comparable to the verified numbers above.

SDK	frames/sec (median)	MB/s	blocks/frame
zig (zat, arena reuse)	529,424	2,638.0	9.98
zig (zat, alloc per frame)	521,925	2,529.7	9.98
rust (raw, arena reuse)	226,146	1,097.4	9.98
rust (raw, alloc per frame)	200,763	1,012.9	9.98
rust (jacquard)	56,523	275.8	9.98
go (raw, fxamacker/cbor)	40,137	187.0	9.98
python (atproto)	33,842	163.0	9.98
go (indigo)	15,587	75.6	9.98

note: indigo appears in both tables. its number is the same because it always verifies — there is no option to disable it in go-car v1.

run-to-run variance is ~30-40%. compare ratios within a single just bench run, not across runs.

CID verification#

a CID (Content IDentifier) contains a hash digest of the block's content. verifying it means SHA-256 hashing each block and comparing against the digest in the CID. this proves the block wasn't corrupted or tampered with in transit.

SDK	verifies CID hashes?	notes
zig (zat)	yes (v0.2.1+)	`car.read()` verifies by default; `readWithOptions(.{ .verify_block_hashes = false })` to skip
rust (rsky, rs-car-sync)	yes	`CarReader::new(&mut cursor, true)` — second arg enables verification
go (indigo, go-car v1)	yes (always)	no option to disable in v1
rust (jacquard, iroh-car)	no	not implemented
rust (raw)	no	not implemented
go (raw)	no	not implemented
python (libipld)	no	not implemented

what each SDK does#

every SDK takes the same raw binary frame and decodes all the way through to per-block DAG-CBOR:

SDK	decode path
zig	`cbor.decode` header → `cbor.decodeAll` payload → `car.read` (+ SHA-256 verify) → `cbor.decodeAll` per block
rust (rsky stack)	`ciborium` header → `serde_ipld_dagcbor` payload → `rs-car-sync` CAR (+ SHA-256 verify) → `serde_ipld_dagcbor` per block
rust (raw)	`minicbor::Decoder` header → payload → hand-rolled sync CAR → `minicbor` + `bumpalo` per block
rust (jacquard)	`SubscribeReposMessage::decode_framed` → typed `Commit`, `jacquard_repo::car::parse_car_bytes` → blocks, `serde_ipld_dagcbor` per block
go (raw)	`fxamacker/cbor` struct decode → hand-rolled sync CAR → `fxamacker/cbor` Unmarshal per block
go (indigo)	`evt.Deserialize` → typed `RepoCommit` via code-gen CBOR → `car.NewBlockReader` (+ SHA-256 verify) → `cbornode.DecodeInto` per block
python	`Frame.from_bytes` + `parse_subscribe_repos_message` → `CAR.from_bytes` (libipld decodes all blocks internally)

correctness parity#

we traced the full decode path of every SDK to verify that no SDK is winning by skipping correctness work.

what zat, rsky, and indigo all do per frame:

decode full CBOR payload (all commit fields — repo, rev, ops, timestamp, etc.)
parse CAR header and all block sections
parse CID structure (version, codec, multihash) for each block
SHA-256 hash each block and compare against CID digest
decode every block as DAG-CBOR

what zat and indigo do that isn't obvious:

enforce size limits (2MB max on blocks field, max block count) — zat matches indigo's limits as of v0.2.2

what none of them do:

DAG-CBOR deterministic encoding validation (sorted keys, minimal integers) — indigo's refmt doesn't check this either
signature verification — separate from decode, not measured here
MST validation — separate from decode, not measured here

there are no correctness differences between the verified decode paths. the performance gaps are entirely implementation cost.

where the ~15x comes from#

we traced indigo's decode path at the instruction level. the cost compounds from several architectural differences:

factor	indigo	zat	approx cost
CBOR decode	refmt: token pump → reflection → `reflect.SetMapIndex` per entry	hand-written, direct dispatch	~3-4x
string/byte handling	Go `string` heap allocation per value (repo, rev, path, action, per-block keys)	zero-copy slices into input buffer	~2-3x
memory management	per-object GC'd heap allocation; every map, array, int is boxed	arena allocator, 24-byte `Value` union	~2-3x
CAR block reads	`make([]byte, section_len)` + copy per block; CID parsed twice (once to read, once to verify)	reads directly from input slice; CID parsed once	~1.5x
blocks field	`make([]uint8, len)` + `io.ReadFull` copies entire CAR payload	slices into input buffer	~1.2x

these factors multiply. refmt's reflection overhead × per-value heap allocation × GC pressure × byte copying = ~15x on this workload.

note: indigo's cbor-gen (code-generated unmarshal for the commit struct) is fast — the bottleneck is cbornode.DecodeInto (refmt/reflection) for the per-block DAG-CBOR decode, which runs ~10 times per frame.

fairness notes#

CID verification: zat, rsky, and indigo all verify block hashes. this is ~2x overhead for zat (290k vs 595k fps). the decode-only table exists for architectural comparison, but the production-correct table is the one that matters for real-world use
rust (rsky stack) — by Rudy Fraser (BlackSky) — is a full AT Protocol implementation in Rust: PDS, relay, feed generator, labeler, plus library crates (rsky-repo, rsky-crypto, rsky-identity). our decode bench uses the same crates as rsky's relay: ciborium for CBOR header, serde_ipld_dagcbor for DAG-CBOR body/blocks, rs-car-sync for CAR with CID verification, and RustCrypto k256/p256 for signatures
zig and rust (raw) both use arena allocation + zero-copy string/byte decoding. the "alloc per frame" variants are the fair cross-language comparison; "arena reuse" shows the production pattern
rust (jacquard) — by @nonbinary.computer — is a client-focused AT Protocol SDK for Rust with zero-copy deserialization, generated API bindings, MST/CAR/identity support, and OAuth. it pays for serde-based owned deserialization (String, BTreeMap), async CAR parsing (tokio poll/wake per block via iroh-car), and per-object heap allocation
go (raw) uses fxamacker/cbor (no reflection for known struct types), a hand-rolled sync CAR parser (no CID hash verification), and no indigo dependency. GC pressure remains the fundamental constraint — Go's experimental arena package (GOEXPERIMENT=arenas) is on hold and not recommended for production
go (indigo) — bluesky's own production relay — uses code-generated CBOR unmarshal (no reflection at the frame level) but pays for go-car's per-block CID hash verification and cbornode's reflection-based DAG-CBOR decode via the unmaintained refmt library
python is faster than jacquard despite being "Python" — its hot path is libipld (Rust via PyO3), which does the entire CAR parse + per-block DAG-CBOR decode in one synchronous C-extension call
error handling: all SDKs use infallible decode functions that never abort on failure — errors are counted and the frame is skipped
capture coupling: the corpus capture tool uses zat's CBOR decoder for the commit-with-ops header peek. this is standard CBOR parsing (not zat's typed firehose decoder), but it does mean frames that zat's CBOR decoder rejects won't appear in the corpus

signature verification#

a separate benchmark measures the full signature verification pipeline that relays perform on every incoming commit. this is CPU-bound ECDSA work that compounds with scale (~500-1000 verifies/sec on the live network, much higher during backfill).

what it measures#

per entry, both SDKs do identical work:

CBOR decode the signed commit (has sig field)
strip the sig field, re-encode as unsigned CBOR (deterministic DAG-CBOR)
SHA-256 hash the unsigned bytes
ECDSA verify the hash against the signature using the account's public key
dispatch by curve type: P-256 or secp256k1

both enforce low-S normalization on both curves.

three tiers#

full pipeline: CBOR decode → strip sig → re-encode → SHA-256 → ECDSA verify (what a relay actually does)
crypto-only: SHA-256 → ECDSA verify with pre-computed unsigned bytes (isolates crypto cost from CBOR overhead)
preparsed-key: SHA-256 → ECDSA verify with pre-parsed public keys (isolates pure ECDSA math from SEC1 decompression)

results#

3,072 signed commits (all secp256k1), 5 measured passes, macOS arm64 (M3 Max)

SDK	variant	verifies/sec (median)	entries	secp256k1
rust (rsky stack)	full pipeline	18,974	3,072	3,072
rust (rsky stack)	crypto-only	19,310	3,072	3,072
rust (rsky stack)	preparsed-key	20,631	3,072	3,072
zig (zat + k256)	full pipeline	15,385	3,072	3,072
zig (zat + k256)	crypto-only	16,338	3,072	3,072
zig (zat + k256)	preparsed-key	19,148	3,072	3,072
go (indigo)	full pipeline	14,768	3,072	3,072
go (indigo)	crypto-only	15,399	3,072	3,072
go (indigo)	preparsed-key	18,227	3,072	3,072

all three are competitive. all use optimized secp256k1 with GLV endomorphism — RustCrypto k256 (complete addition formulas, pure Rust), zat's k256 ports libsecp256k1's field arithmetic, indigo uses decred/dcrd.

the crypto-only vs full-pipeline numbers being nearly identical confirms ECDSA is the bottleneck, not CBOR re-encoding overhead. the preparsed-key tier shows key parsing is a small but measurable cost — relevant for relay implementations that cache public keys per-DID.

sig-verify corpus format#

[u32 BE entry_count]
per entry:
  [u8 curve_type]                         // 0 = P-256, 1 = secp256k1
  [u16 BE signed_len][signed_bytes...]    // signed commit CBOR (with sig field)
  [u16 BE pubkey_len][pubkey_bytes...]    // compressed public key (33 bytes)

captured by connecting to the firehose, extracting signed commit blocks from CAR data, and resolving each DID via PLC directory to get the signing key. entries that fail verification are dropped during capture.

corpus format#

the fixture file (fixtures/firehose-frames.bin) uses a simple length-prefixed binary format:

[u32 BE frame_count]
[u32 BE frame_1_len][frame_1 bytes]
[u32 BE frame_2_len][frame_2 bytes]
...

frames are captured from ~10 seconds of live firehose traffic, filtered to commits with ops using a minimal CBOR header peek.

when this matters#

for live firehose consumption: usually no. at ~500-1000 events/sec (full bluesky network), any of these SDKs handle the load.

where it matters:

backfill / replay — processing months of historical data. decode + verify throughput determines catch-up speed.
relays at scale — routing events to many downstream consumers. every microsecond of decode + verify overhead compounds across fan-out.
memory — smaller value types mean less memory per in-flight frame.

the overall picture: decode throughput varies ~19x across the production-correct SDKs (dominated by CBOR/memory architecture choices). signature verification is competitive across all three — zig, rust, and go all land within ~40% of each other using optimized secp256k1 libraries. for relay workloads, decode is the differentiator; sig-verify is table stakes.

SDKs tested#

lang	SDK	version	CBOR engine	CAR engine
zig	zat v0.2.2 + k256 v0.0.4	—	hand-rolled	hand-rolled (+ SHA-256 CID verify, size limits)
rust	rsky stack	—	ciborium (header) + serde_ipld_dagcbor (body)	rs-car-sync (+ SHA-256 CID verify)
rust	raw (minicbor + bumpalo)	—	minicbor (zero-copy)	hand-rolled (sync)
rust	jacquard	0.9	ciborium (header) + serde_ipld_dagcbor (body)	iroh-car (async)
go	raw (fxamacker/cbor)	—	fxamacker/cbor	hand-rolled (sync, no CID verify)
go	indigo	latest	cbor-gen (code-generated)	go-car/v2 (+ SHA-256 CID verify)
python	atproto	0.0.65	libipld (Rust via PyO3)	libipld

trust chain verification#

a third benchmark measures the full end-to-end AT Protocol trust chain: given a handle, resolve identity, fetch the repo, and cryptographically verify everything. this exercises the complete pipeline that any independent verifier needs.

what it measures#

per handle, all three SDKs do the same work:

resolve handle → DID (HTTP well-known or DNS TXT)
resolve DID → DID document → extract signing key + PDS endpoint
fetch full repo CAR from PDS (com.atproto.sync.getRepo)
parse CAR with CID verification (SHA-256 per block)
ECDSA verify the commit signature against the signing key
walk the MST to count all records
rebuild MST and verify root CID matches commit (zig and go only)

results#

pfrazee.com — 192,144 records, 243,470 blocks, 70.6 MB CAR, macOS arm64 (M3 Max)

SDK	CAR parse	sig verify	MST walk	MST rebuild	compute total	network
zig (zat)	81.6ms	0.6ms	45.5ms	172.6ms	300.4ms	9.8s
go (indigo)	403.8ms	0.4ms	5.8ms	0.0ms	410.0ms	20.8s
rust (rsky stack)	301.0ms	0.2ms	120.9ms	N/A	422.1ms	8.7s

the rust verify uses the same low-level crates that rsky (Rudy Fraser / BlackSky) uses internally (k256, p256, serde_ipld_dagcbor, sha2) but not rsky's higher-level abstractions (rsky-repo, rsky-identity, rsky-crypto). no Rust equivalent of indigo's all-in-one LoadRepoFromCAR exists yet — rsky provides the library crates (MST, CAR, identity, crypto) but the end-to-end verify pipeline is assembled manually. jacquard also has MST/CAR/identity support and could serve as the basis for a higher-level verify path.

compute time is dominated by CAR parsing (SHA-256 verification of 243k blocks) and MST operations. signature verification is sub-millisecond for all three — single ECDSA verify is trivial compared to the bulk data work.

zig's CAR parse advantage carries over from the decode benchmarks (arena allocation + zero-copy). go's MST walk is fastest because indigo's MST.Walk() operates on an already-loaded in-memory tree — nodes are decoded from CBOR once on first access and cached as Go structs, so walking is pure pointer chasing. zig and rust decode MST nodes from raw CBOR on each visit. rust skips MST rebuild (no equivalent crate exists in either rsky or jacquard yet) but the trust chain is still fully proven by signature + walk.

network time varies by run (PDS response time, TLS handshake, geographic distance). compare compute columns, not totals.

just verify pfrazee.com          # run all three
just chart pfrazee.com           # run + generate SVG charts

usage#

# decode benchmarks
just capture       # capture ~10s of firehose traffic
just bench         # run all decode benchmarks
just bench-zig     # run a single language
just bench-rust-rsky

# sig verify benchmarks
just capture-sigs  # capture signed commits + resolve public keys (~10s + DID resolution)
just bench-sigs    # run all sig verify benchmarks (zig + rust + go)
just bench-sigs-zig
just bench-sigs-rust
just bench-sigs-go

# trust chain verification
just verify pfrazee.com    # run all three implementations
just chart pfrazee.com     # run + generate SVG charts to docs/

methodology#

just capture connects to the live firehose for ~10 seconds, filters for commits with ops via CBOR header peek (uses zat's CBOR decoder — see fairness notes), writes a length-prefixed corpus
each benchmark decodes every frame fully: header → payload → CAR → decode every block as DAG-CBOR
zat and indigo additionally SHA-256 verify every block CID
2 warmup passes, 5 measured passes over the full corpus
zig builds with -Doptimize=ReleaseFast, rust with opt-level=3 lto=true
go and python use their standard release toolchains
reported numbers: median frames/sec across passes, plus min/max for variance. block counts and error counts verify work parity across SDKs
run-to-run variance is significant (~30-40% between separate invocations due to system load, thermal state, etc.). ratios between SDKs should be compared within a single just bench run, not across runs

Clone this repository

atproto-bench#

what this measures#

results#

production-correct (with CID hash verification)#

decode-only (no CID hash verification)#

CID verification#

what each SDK does#

correctness parity#

where the ~15x comes from#

fairness notes#

signature verification#

what it measures#

three tiers#

results#

sig-verify corpus format#

corpus format#

when this matters#

SDKs tested#

trust chain verification#

what it measures#

results#

usage#

methodology#