coral#
real-time semantic percolation from the Bluesky firehose.
what it does#
extracts named entities (people, organizations, places, events) from Bluesky posts and tracks how they cluster together. entities that get discussed together form edges. when a real-world event spans multiple topics, clusters merge - discourse percolates into a unified conversation.
how it works#
- NER bridge consumes the turbostream firehose, runs spaCy NER, extracts entities
- labeler integration drops spam before it hits the graph (via Hailey's labeler)
- entity graph tracks co-occurrences (entities in same post = edge), computes clusters via union-find
- pheromone edges - edge weights decay exponentially, reinforced on repeated co-occurrence (ant colony optimization inspired)
- surprise trending - entities ranked by statistical surprise vs baseline (z‑like), not raw counts
- frontend visualizes entity activity, cluster structure, and firehose health
theoretical background#
the system draws from several sources:
percolation theory - we use the Newman-Ziff algorithm for efficient cluster detection. on lattices, percolation has a sharp phase transition at p_c ≈ 0.593. our graph isn't a lattice, so we calibrate empirically.
heterogeneous activity - Xie et al. 2021 showed that real social networks percolate at ~1/10th the uniform-theory threshold due to heterogeneous user activity. we weight mentions by user activity rate following this insight.
NER for topic detection - inspired by Hailey's trending topics. rather than embeddings on raw text (too noisy), extract structured entities to reduce surface area.
ATProto labeler system - spam filtering via com.atproto.label. we subscribe to Hailey's labeler stream and drop posts from accounts labeled as spam before NER processing.
design decisions
these are documented as arbitrary choices to be revisited:
| decision | choice | why |
|---|---|---|
| edge definition | same-post co-occurrence | simplest, captures "discussed together" |
| edge weights | pheromone decay (configurable half-life) | ant colony inspired, recent co-occurrences matter more |
| activity threshold | 0.01 mentions/sec (~3 per 5 min) | rate normalizes across quiet/busy periods |
| trending metric | surprise vs baseline (UI), trend ratio (backend) | anomaly detection, not popularity contest |
| percolation threshold | largest_cluster / active > 50% | placeholder, needs empirical calibration |
| entity position | hash(text) → (x, y) | deterministic, stable, no semantic meaning yet |
| user weighting | planned (currently off) | power users count more (Xie 2021) |
see docs/02-semantic-percolation-plan.md for full rationale.
stack#
- ner (python): turbostream consumer + spaCy NER + labeler gate → POST to backend
- backend (zig): entity graph + websocket server + SQLite persistence
- site: static html/css/js on cloudflare pages
run locally#
cd backend && zig build run # backend (entity graph + websocket)
cd ner && uv run coral-bridge # NER bridge (turbostream → spaCy → backend)
cd site && npx wrangler pages dev . # frontend
deploy#
cd backend && fly deploy
cd ner && fly deploy
cd site && npx wrangler pages deploy . --project-name coral
future work#
ideas being explored (not commitments):
-
semantic positioning - currently entities hash to arbitrary grid positions. could use embeddings to place semantically similar entities near each other, making the 2D layout a meaningful projection of topic space. unclear whether to embed entity names, representative posts, or cluster centroids.
-
temporal co-activity edges - entities that spike together might be related even without same-post co-occurrence. "earthquake" and "LA" could both trend during an event without always appearing together.
-
percolation calibration - the 50% threshold is arbitrary. need to correlate cluster merges with real-world events to understand what "discourse unification" actually looks like in the data.
references#
- Newman & Ziff, Efficient Monte Carlo algorithm and high-precision results for percolation, Phys. Rev. Lett. 85 (2000)
- Xie et al., Detecting and Modelling Real Percolation and Phase Transitions of Information on Social Media, Nature Human Behaviour (2021)
- Hailey, Bluesky Trending Topics - NER approach for topic detection
- Stauffer & Aharony, Introduction to Percolation Theory - theoretical foundations
- ATProto Labels - moderation architecture