experiments#
active experiments on the deployed relay. each entry tracks what changed, why, how to verify, and how to revert.
exp-001: SmpAllocator instead of glibc malloc (2026-03-06)#
hypothesis: zlay's linear RSS growth (~290 MiB/hour) is caused by glibc malloc fragmentation under cross-thread alloc/free patterns. ~2750 subscriber threads allocate frame data that's freed by 16 worker threads. glibc's per-thread arenas (even with MALLOC_ARENA_MAX=2) don't return these cross-thread freed pages to the OS.
what changed:
src/main.zig:std.heap.c_allocator→std.heap.smp_allocatorsrc/main.zig: removedmalloc_trim(0)from GC loop (SmpAllocator doesn't use glibc heap)
why SmpAllocator:
- zig's built-in multi-threaded allocator (since 0.14)
- uses mmap/munmap directly — no glibc malloc involvement
- thread-local freelists with cross-thread reclamation (exactly our problem)
- zero new dependencies
evidence supporting this:
- rsky (Rust relay) uses mimalloc globally and doesn't have this problem
- indigo (Go relay) uses Go GC which has no per-thread arena fragmentation
- page_allocator experiment (per-frame arenas only) didn't help — leak is in cross-thread c_allocator paths
- malloc_trim(0) didn't help — only trims main glibc arena
- mallinfo() was misleading — only reports main arena, not per-thread arenas
verification:
- build succeeds
- deploy, pod starts,
/_healthreturns 200 - firehose streams,
listReposByCollectionworks - watch grafana over 12-24 hours:
relay_process_rss_bytesshould plateau (not climb linearly)relay_malloc_arena_bytesshould be near-zero (glibc no longer in use)
- if RSS stabilizes under ~1.5 GiB after caches fill, experiment succeeded
revert:
// src/main.zig — change allocator back:
const allocator = std.heap.c_allocator;
// src/main.zig — restore malloc_trim in gcLoop:
_ = malloc_h.malloc_trim(0);
log.info("gc: malloc_trim complete", .{});
result: FAILED — RSS grew at ~670 MiB/hour (worse than c_allocator's ~290 MiB/hour). this disproves glibc fragmentation as the root cause. the leak is genuine — memory is allocated and never freed. reverted to c_allocator.
status: reverted (2026-03-07)
exp-002: GPA leak detection (2026-03-07)#
goal: identify exactly which allocations are leaking by using zig's GeneralPurposeAllocator as a wrapper. GPA tracks every alloc/free and reports unfreed allocations with stack traces on clean shutdown.
what changed:
build.zig: added-Duse_gpa=truebuild optionsrc/main.zig: conditional GPA wrapper — when enabled, all allocations go through GPA backed by c_allocator. on SIGTERM, after all components deinit, GPA reports leaks.
how to use:
# build with GPA enabled (on the server):
just zlay publish-remote ReleaseSafe --gpa
# or manually:
zig build -Doptimize=ReleaseSafe -Duse_gpa=true -Dtarget=x86_64-linux-gnu
# let it run for 10-30 minutes, then:
kubectl exec -n zlay deploy/zlay -- kill -TERM 1
# read the leak report:
kubectl logs -n zlay deploy/zlay --previous | grep -A5 "GPA"
performance impact: GPA adds a mutex + metadata tracking per alloc/free. expect ~2-5x slower throughput. this is a diagnostic build, not for production.
what to look for in output:
- GPA logs to stderr on deinit. each leaked allocation shows the stack trace of where it was allocated.
- look for the most frequently repeated stack traces — those are the hot leak sites.
revert: just rebuild without -Duse_gpa=true (default is false, zero overhead).
deployment attempt (2026-03-07):
- GPA's per-allocation metadata tracking consumed memory ~55x faster than the base leak (~16 GiB/hour vs ~290 MiB/hour). at ~700 frames/sec × ~37 allocs/frame = ~26K tracked allocations/sec, the metadata itself dominates.
- caused severe sawtooth pattern: ~7-8 OOM kills in ~3 hours (8 GiB limit)
- first pod: logs lost when
kubectl delete podwas used (should have usedkubectl scale --replicas=0) - second pod: RocksDB lock file stale after first crash, had to clear manually
- reverted to normal ReleaseSafe build after ~4 hours (relay was submitted for testing)
learnings for next attempt:
- need to reduce incoming load (fewer PDS hosts) to slow memory growth enough that GPA overhead doesn't OOM
- or increase memory limit temporarily (e.g. 16 GiB) for the diagnostic window
- use
kubectl scale deployment/zlay -n zlay --replicas=0to preserve logs (notkubectl delete pod) - container lacks
killbinary — need an admin endpoint or install procps in the image - consider adding
/admin/shutdownHTTP endpoint to trigger graceful shutdown withoutkill
status: paused — code merged (compiled out by default), needs better deployment strategy