fix: use thread-local storage for trace context
concurrent requests on different thread pool workers were sharing
the same trace_id because current_trace_id was process-global.
now each thread has its own trace context via threadlocal:
- tl_trace_id: current trace ID for this thread
- tl_active_span_count: span nesting depth for this thread
this ensures each HTTP request gets its own trace_id even when
handled by different workers concurrently.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>