pub search#
search ATProto publishing platforms (leaflet, pckt, offprint, greengale, and others using standard.site).
live: pub-search.waow.tech
formerly "leaflet-search" - generalized to support multiple publishing platforms
how it works#
- tap syncs content from ATProto firehose (signals on
site.standard.document, filterspub.leaflet.*+site.standard.*) - backend indexes content into SQLite FTS5 via Turso, serves search API
- site static frontend on Cloudflare Pages
MCP server#
search is also exposed as an MCP server for AI agents like Claude Code:
claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'
see mcp/README.md for local setup and usage details.
api#
GET /search?q=<query>&tag=<tag>&platform=<platform>&since=<date> # full-text search
GET /similar?uri=<at-uri> # find similar documents
GET /tags # list all tags with counts
GET /popular # popular search queries
GET /stats # counts + request latency (p50/p95)
GET /health # health check
search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). each result includes a platform field (leaflet, pckt, offprint, greengale, or other). tag and platform filtering apply to documents only.
ranking: results use hybrid BM25 + recency scoring. text relevance is primary, but recent documents get a boost (~1 point per 30 days). the since parameter filters to documents created after the given ISO date (e.g., since=2025-01-01).
/similar uses Voyage AI embeddings with brute-force cosine similarity (~0.15s for 3500 docs).
configuration#
the backend is fully configurable via environment variables:
| variable | default | description |
|---|---|---|
APP_NAME |
leaflet-search |
name shown in startup logs |
DASHBOARD_URL |
https://pub-search.waow.tech/dashboard.html |
redirect target for /dashboard |
TAP_HOST |
leaflet-search-tap.fly.dev |
tap websocket host |
TAP_PORT |
443 |
tap websocket port |
PORT |
3000 |
HTTP server port |
TURSO_URL |
- | Turso database URL (required) |
TURSO_TOKEN |
- | Turso auth token (required) |
VOYAGE_API_KEY |
- | Voyage AI API key (for embeddings) |
the backend indexes multiple ATProto platforms - currently pub.leaflet.* and site.standard.* collections. platform is stored per-document and returned in search results.
stack#
- Fly.io hosts Zig search API and content indexing
- Turso cloud SQLite with Voyage AI vector support
- tap syncs content from ATProto firehose
- Cloudflare Pages static frontend
embeddings#
documents are embedded using Voyage AI's voyage-3-lite model (512 dimensions). the backend automatically generates embeddings for new documents via a background worker - no manual backfill needed.
note: we use brute-force cosine similarity instead of a vector index. Turso's DiskANN index has ~60s write latency per row, making it impractical for incremental updates. brute-force on 3500 vectors runs in ~0.15s which is fine for this scale.