pub search#
search ATProto publishing platforms (leaflet, pckt, offprint, greengale, and others using standard.site).
live: pub-search.waow.tech
formerly "leaflet-search" - generalized to support multiple publishing platforms
how it works#
- tap syncs content from ATProto firehose (
pub.leaflet.*,site.standard.*,com.whtwnd.*) - backend indexes content into SQLite FTS5 via Turso, serves search API with keyword, semantic, and hybrid modes
- site static frontend on Cloudflare Pages
- mcp server for AI agents (Claude Code, etc.)
MCP server#
search is also exposed as an MCP server for AI agents like Claude Code:
claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'
see mcp/README.md for local setup and usage details.
api#
GET /search?q=<query>&mode=keyword|semantic|hybrid&platform=<platform>&tag=<tag>&since=<date>&format=v2
GET /similar?uri=<at-uri>&format=v2
GET /tags
GET /popular
GET /stats
GET /health
search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). each result includes a platform field (leaflet, pckt, offprint, greengale, whitewind, or other). use format=v2 for a wrapped response with total, offset, and results fields.
modes: keyword (default) uses FTS5 with BM25 + recency scoring. semantic uses voyage embeddings + turbopuffer ANN. hybrid merges both via reciprocal rank fusion.
ranking: keyword results use hybrid BM25 + recency scoring. text relevance is primary, but recent documents get a boost (~1 point per 30 days). the since parameter filters to documents created after the given ISO date (e.g., since=2025-01-01).
/similar uses Voyage AI embeddings with turbopuffer ANN search.
configuration#
the backend is fully configurable via environment variables:
| variable | default | description |
|---|---|---|
APP_NAME |
leaflet-search |
name shown in startup logs |
DASHBOARD_URL |
https://pub-search.waow.tech/dashboard.html |
redirect target for /dashboard |
TAP_HOST |
leaflet-search-tap.fly.dev |
tap websocket host |
TAP_PORT |
443 |
tap websocket port |
PORT |
3000 |
HTTP server port |
TURSO_URL |
- | Turso database URL (required) |
TURSO_TOKEN |
- | Turso auth token (required) |
VOYAGE_API_KEY |
- | Voyage AI API key (for embeddings) |
the backend indexes multiple ATProto platforms - currently pub.leaflet.* and site.standard.* collections. platform is stored per-document and returned in search results.
stack#
- Fly.io hosts Zig search API and content indexing
- Turso cloud SQLite (source of truth) + local read replica (FTS queries)
- turbopuffer ANN vector search
- Voyage AI embeddings (voyage-4-lite, 1024 dims)
- tap syncs content from ATProto firehose
- Cloudflare Pages static frontend
embeddings#
documents are embedded using Voyage AI's voyage-4-lite model (1024 dimensions). the backend automatically generates embeddings for new documents via a background worker — no manual backfill needed. similarity search uses turbopuffer's ANN index for fast nearest-neighbor queries across ~25k documents.