search for standard sites pub-search.waow.tech
search zig blog atproto
Zig 55.6%
HTML 11.1%
JavaScript 6.3%
Python 5.8%
CSS 1.2%
Just 0.4%
Dockerfile 0.2%
Other 19.4%
327 7 0

Clone this repository

https://tangled.org/zzstoatzz.io/leaflet-search https://tangled.org/did:plc:xbtmt2zjwlrfegqvch7fboei/leaflet-search
git@tangled.org:zzstoatzz.io/leaflet-search git@tangled.org:did:plc:xbtmt2zjwlrfegqvch7fboei/leaflet-search

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

pub search#

by @zzstoatzz.io

search ATProto publishing platforms (leaflet, pckt, offprint, greengale, and others using standard.site).

live: pub-search.waow.tech

formerly "leaflet-search" - generalized to support multiple publishing platforms

how it works#

  1. tap syncs content from ATProto firehose (pub.leaflet.*, site.standard.*, com.whtwnd.*)
  2. backend indexes content into SQLite FTS5 via Turso, serves search API with keyword, semantic, and hybrid modes
  3. site static frontend on Cloudflare Pages
  4. mcp server for AI agents (Claude Code, etc.)

MCP server#

search is also exposed as an MCP server for AI agents like Claude Code:

claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'

see mcp/README.md for local setup and usage details.

api#

GET /search?q=<query>&mode=keyword|semantic|hybrid&platform=<platform>&tag=<tag>&since=<date>&author=<did|handle>&format=v2
GET /similar?uri=<at-uri>&format=v2
GET /tags
GET /popular
GET /stats
GET /health

search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). each result includes a platform field (leaflet, pckt, offprint, greengale, whitewind, or other). use format=v2 for a wrapped response with total, hasMore, and results fields.

modes: keyword (default) uses FTS5 with BM25 + recency scoring. semantic uses voyage embeddings + turbopuffer ANN. hybrid merges both via reciprocal rank fusion.

ranking: keyword results use hybrid BM25 + recency scoring. text relevance is primary, but recent documents get a boost (~1 point per 30 days). the since parameter filters to documents created after the given ISO date (e.g., since=2025-01-01).

/similar uses Voyage AI embeddings with turbopuffer ANN search.

configuration#

the backend is fully configurable via environment variables:

variable default description
APP_NAME leaflet-search name shown in startup logs
DASHBOARD_URL https://pub-search.waow.tech/dashboard.html redirect target for /dashboard
TAP_HOST leaflet-search-tap.fly.dev tap websocket host
TAP_PORT 443 tap websocket port
PORT 3000 HTTP server port
TURSO_URL - Turso database URL (required)
TURSO_TOKEN - Turso auth token (required)
VOYAGE_API_KEY - Voyage AI API key (for embeddings)

the backend indexes multiple ATProto platforms - currently pub.leaflet.* and site.standard.* collections. platform is stored per-document and returned in search results.

stack#

  • Fly.io hosts Zig search API and content indexing
  • Turso cloud SQLite (source of truth) + local read replica (FTS queries)
  • turbopuffer ANN vector search
  • Voyage AI embeddings (voyage-4-lite, 1024 dims)
  • tap syncs content from ATProto firehose
  • Cloudflare Pages static frontend

embeddings#

documents are embedded using Voyage AI's voyage-4-lite model (1024 dimensions). the backend automatically generates embeddings for new documents via a background worker — no manual backfill needed. similarity search uses turbopuffer's ANN index for fast nearest-neighbor queries.