search for standard sites pub-search.waow.tech
search zig blog atproto
Zig 58.9%
HTML 11.5%
Python 7.9%
JavaScript 5.0%
CSS 1.2%
Just 0.7%
Dockerfile 0.3%
Other 14.4%
227 7 0

Clone this repository

https://tangled.org/zzstoatzz.io/leaflet-search https://tangled.org/did:plc:xbtmt2zjwlrfegqvch7fboei/leaflet-search
git@tangled.org:zzstoatzz.io/leaflet-search git@tangled.org:did:plc:xbtmt2zjwlrfegqvch7fboei/leaflet-search

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

pub search#

by @zzstoatzz.io

search ATProto publishing platforms (leaflet, pckt, offprint, greengale, and others using standard.site).

live: pub-search.waow.tech

formerly "leaflet-search" - generalized to support multiple publishing platforms

how it works#

  1. tap syncs content from ATProto firehose (signals on site.standard.document, filters pub.leaflet.* + site.standard.*)
  2. backend indexes content into SQLite FTS5 via Turso, serves search API
  3. site static frontend on Cloudflare Pages

MCP server#

search is also exposed as an MCP server for AI agents like Claude Code:

claude mcp add-json pub-search '{"type": "http", "url": "https://pub-search-by-zzstoatzz.fastmcp.app/mcp"}'

see mcp/README.md for local setup and usage details.

api#

GET /search?q=<query>&tag=<tag>&platform=<platform>&since=<date>  # full-text search
GET /similar?uri=<at-uri>                                          # find similar documents
GET /tags                                                          # list all tags with counts
GET /popular                                                       # popular search queries
GET /stats                                                         # counts + request latency (p50/p95)
GET /health                                                        # health check

search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). each result includes a platform field (leaflet, pckt, offprint, greengale, or other). tag and platform filtering apply to documents only.

ranking: results use hybrid BM25 + recency scoring. text relevance is primary, but recent documents get a boost (~1 point per 30 days). the since parameter filters to documents created after the given ISO date (e.g., since=2025-01-01).

/similar uses Voyage AI embeddings with brute-force cosine similarity (~0.15s for 3500 docs).

configuration#

the backend is fully configurable via environment variables:

variable default description
APP_NAME leaflet-search name shown in startup logs
DASHBOARD_URL https://pub-search.waow.tech/dashboard.html redirect target for /dashboard
TAP_HOST leaflet-search-tap.fly.dev tap websocket host
TAP_PORT 443 tap websocket port
PORT 3000 HTTP server port
TURSO_URL - Turso database URL (required)
TURSO_TOKEN - Turso auth token (required)
VOYAGE_API_KEY - Voyage AI API key (for embeddings)

the backend indexes multiple ATProto platforms - currently pub.leaflet.* and site.standard.* collections. platform is stored per-document and returned in search results.

stack#

embeddings#

documents are embedded using Voyage AI's voyage-3-lite model (512 dimensions). the backend automatically generates embeddings for new documents via a background worker - no manual backfill needed.

note: we use brute-force cosine similarity instead of a vector index. Turso's DiskANN index has ~60s write latency per row, making it impractical for incremental updates. brute-force on 3500 vectors runs in ~0.15s which is fine for this scale.