experiments#

active experiments on the deployed relay. each entry tracks what changed, why, how to verify, and how to revert.

exp-001: SmpAllocator instead of glibc malloc (2026-03-06)#

hypothesis: zlay's linear RSS growth (~290 MiB/hour) is caused by glibc malloc fragmentation under cross-thread alloc/free patterns. ~2750 subscriber threads allocate frame data that's freed by 16 worker threads. glibc's per-thread arenas (even with MALLOC_ARENA_MAX=2) don't return these cross-thread freed pages to the OS.

what changed:

src/main.zig: std.heap.c_allocator → std.heap.smp_allocator
src/main.zig: removed malloc_trim(0) from GC loop (SmpAllocator doesn't use glibc heap)

why SmpAllocator:

zig's built-in multi-threaded allocator (since 0.14)
uses mmap/munmap directly — no glibc malloc involvement
thread-local freelists with cross-thread reclamation (exactly our problem)
zero new dependencies

evidence supporting this:

rsky (Rust relay) uses mimalloc globally and doesn't have this problem
indigo (Go relay) uses Go GC which has no per-thread arena fragmentation
page_allocator experiment (per-frame arenas only) didn't help — leak is in cross-thread c_allocator paths
malloc_trim(0) didn't help — only trims main glibc arena
mallinfo() was misleading — only reports main arena, not per-thread arenas

verification:

build succeeds
deploy, pod starts, /_health returns 200
firehose streams, listReposByCollection works
watch grafana over 12-24 hours:
- relay_process_rss_bytes should plateau (not climb linearly)
- relay_malloc_arena_bytes should be near-zero (glibc no longer in use)
if RSS stabilizes under ~1.5 GiB after caches fill, experiment succeeded

revert:

// src/main.zig — change allocator back:
const allocator = std.heap.c_allocator;

// src/main.zig — restore malloc_trim in gcLoop:
_ = malloc_h.malloc_trim(0);
log.info("gc: malloc_trim complete", .{});

result: FAILED — RSS grew at ~670 MiB/hour (worse than c_allocator's ~290 MiB/hour). this disproves glibc fragmentation as the root cause. the leak is genuine — memory is allocated and never freed. reverted to c_allocator.

status: reverted (2026-03-07)

exp-002: GPA leak detection (2026-03-07)#

goal: identify exactly which allocations are leaking by using zig's GeneralPurposeAllocator as a wrapper. GPA tracks every alloc/free and reports unfreed allocations with stack traces on clean shutdown.

what changed:

build.zig: added -Duse_gpa=true build option
src/main.zig: conditional GPA wrapper — when enabled, all allocations go through GPA backed by c_allocator. on SIGTERM, after all components deinit, GPA reports leaks.

how to use:

# build with GPA enabled (on the server):
just zlay publish-remote ReleaseSafe --gpa
# or manually:
zig build -Doptimize=ReleaseSafe -Duse_gpa=true -Dtarget=x86_64-linux-gnu

# let it run for 10-30 minutes, then:
kubectl exec -n zlay deploy/zlay -- kill -TERM 1

# read the leak report:
kubectl logs -n zlay deploy/zlay --previous | grep -A5 "GPA"

performance impact: GPA adds a mutex + metadata tracking per alloc/free. expect ~2-5x slower throughput. this is a diagnostic build, not for production.

what to look for in output:

GPA logs to stderr on deinit. each leaked allocation shows the stack trace of where it was allocated.
look for the most frequently repeated stack traces — those are the hot leak sites.

revert: just rebuild without -Duse_gpa=true (default is false, zero overhead).

deployment attempt (2026-03-07):

GPA's per-allocation metadata tracking consumed memory ~55x faster than the base leak (~16 GiB/hour vs ~290 MiB/hour). at ~700 frames/sec × ~37 allocs/frame = ~26K tracked allocations/sec, the metadata itself dominates.
caused severe sawtooth pattern: ~7-8 OOM kills in ~3 hours (8 GiB limit)
first pod: logs lost when kubectl delete pod was used (should have used kubectl scale --replicas=0)
second pod: RocksDB lock file stale after first crash, had to clear manually
reverted to normal ReleaseSafe build after ~4 hours (relay was submitted for testing)

learnings for next attempt:

need to reduce incoming load (fewer PDS hosts) to slow memory growth enough that GPA overhead doesn't OOM
or increase memory limit temporarily (e.g. 16 GiB) for the diagnostic window
use kubectl scale deployment/zlay -n zlay --replicas=0 to preserve logs (not kubectl delete pod)
container lacks kill binary — need an admin endpoint or install procps in the image
consider adding /admin/shutdown HTTP endpoint to trigger graceful shutdown without kill

status: paused — code merged (compiled out by default), needs better deployment strategy