search for standard sites
pub-search.waow.tech
search
zig
blog
atproto
1# pub search
2
3by [@zzstoatzz.io](https://bsky.app/profile/zzstoatzz.io)
4
5search ATProto publishing platforms ([leaflet](https://leaflet.pub), [pckt](https://pckt.blog), [offprint](https://offprint.app), [greengale](https://greengale.app), and others using [standard.site](https://standard.site)).
6
7**live:** [pub-search.waow.tech](https://pub-search.waow.tech)
8
9> formerly "leaflet-search" - generalized to support multiple publishing platforms
10
11## how it works
12
131. **[tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap)** syncs content from ATProto firehose (`pub.leaflet.*`, `site.standard.*`, `com.whtwnd.*`)
142. **backend** indexes content into SQLite FTS5 via [Turso](https://turso.tech), serves search API with keyword, semantic, and hybrid modes
153. **site** static frontend on Cloudflare Pages
164. **mcp** server for AI agents (Claude Code, etc.)
17
18## MCP server
19
20search is also exposed as an MCP server for AI agents like Claude Code:
21
22```bash
23claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'
24```
25
26see [mcp/README.md](mcp/README.md) for local setup and usage details.
27
28## api
29
30```
31GET /search?q=<query>&mode=keyword|semantic|hybrid&platform=<platform>&tag=<tag>&since=<date>&author=<did|handle>&format=v2
32GET /similar?uri=<at-uri>&format=v2
33GET /tags
34GET /popular
35GET /stats
36GET /health
37```
38
39search returns three entity types: `article` (document in a publication), `looseleaf` (standalone document), `publication` (newsletter itself). each result includes a `platform` field (leaflet, pckt, offprint, greengale, whitewind, or other). use `format=v2` for a wrapped response with `total`, `hasMore`, and `results` fields.
40
41**modes**: `keyword` (default) uses FTS5 with BM25 + recency scoring. `semantic` uses voyage embeddings + [turbopuffer](https://turbopuffer.com) ANN. `hybrid` merges both via reciprocal rank fusion.
42
43**ranking**: keyword results use hybrid BM25 + recency scoring. text relevance is primary, but recent documents get a boost (~1 point per 30 days). the `since` parameter filters to documents created after the given ISO date (e.g., `since=2025-01-01`).
44
45`/similar` uses [Voyage AI](https://voyageai.com) embeddings with [turbopuffer](https://turbopuffer.com) ANN search.
46
47## configuration
48
49the backend is fully configurable via environment variables:
50
51| variable | default | description |
52|----------|---------|-------------|
53| `APP_NAME` | `leaflet-search` | name shown in startup logs |
54| `DASHBOARD_URL` | `https://pub-search.waow.tech/dashboard.html` | redirect target for `/dashboard` |
55| `TAP_HOST` | `leaflet-search-tap.fly.dev` | tap websocket host |
56| `TAP_PORT` | `443` | tap websocket port |
57| `PORT` | `3000` | HTTP server port |
58| `TURSO_URL` | - | Turso database URL (required) |
59| `TURSO_TOKEN` | - | Turso auth token (required) |
60| `VOYAGE_API_KEY` | - | Voyage AI API key (for embeddings) |
61
62the backend indexes multiple ATProto platforms - currently `pub.leaflet.*` and `site.standard.*` collections. platform is stored per-document and returned in search results.
63
64## [stack](https://bsky.app/profile/zzstoatzz.io/post/3mbij5ip4ws2a)
65
66- [Fly.io](https://fly.io) hosts [Zig](https://ziglang.org) search API and content indexing
67- [Turso](https://turso.tech) cloud SQLite (source of truth) + local read replica (FTS queries)
68- [turbopuffer](https://turbopuffer.com) ANN vector search
69- [Voyage AI](https://voyageai.com) embeddings (voyage-4-lite, 1024 dims)
70- [tap](https://github.com/bluesky-social/indigo/tree/main/cmd/tap) syncs content from ATProto firehose
71- [Cloudflare Pages](https://pages.cloudflare.com) static frontend
72
73## embeddings
74
75documents are embedded using Voyage AI's `voyage-4-lite` model (1024 dimensions). the backend automatically generates embeddings for new documents via a background worker — no manual backfill needed. similarity search uses turbopuffer's ANN index for fast nearest-neighbor queries.