search for standard sites pub-search.waow.tech
search zig blog atproto
at main 58 lines 2.4 kB view raw view rendered
1# search syntax 2 3a reference for the query syntax at [pub-search.waow.tech](https://pub-search.waow.tech). 4 5## basics 6 7terms are OR'd together — a query matches documents containing *any* of the words. the last word gets prefix matching for a type-ahead feel. 8 9| you type | what runs | why | 10|----------|-----------|-----| 11| `cat dog` | `cat OR dog*` | matches docs with "cat" or "dog" (or "dogs", "dogma", etc.) | 12| `crypto` | `crypto*` | prefix match: finds "crypto", "cryptocurrency", etc. | 13 14## quoted phrases 15 16wrap words in double quotes for exact phrase matching — FTS5 requires the words to appear adjacent and in order. 17 18| you type | what runs | 19|----------|-----------| 20| `"machine learning"` | `"machine learning"` | 21| `python "machine learning" tutorial` | `python OR "machine learning" OR tutorial*` | 22| `"exact phrase" python` | `"exact phrase" OR python*` | 23 24the last token only gets a prefix `*` if it's a bare word — phrases are never prefix-expanded. 25 26unclosed quotes are treated as phrases: `"hello world``"hello world"`. 27 28## filters 29 30beyond the query text, you can filter results by: 31 32- **author**: type `@handle` in the search box (e.g., `@zat.dev up`). quote to search literally: `"@zat.dev"`. 33- **platform**: leaflet, pckt, offprint, greengale, whitewind, other 34- **tag**: click any tag in the results to filter by it 35- **date**: today, this week, this month, this year 36 37filters combine with the search query — e.g., searching `@zat.dev up` returns only zat.dev's posts matching "up". 38 39## search modes 40 41three modes are available via the toggle below the search box: 42 43- **keyword** (default): SQLite FTS5 full-text search with BM25 ranking + recency boost. fastest (~9ms). 44- **semantic**: vector similarity via Voyage AI embeddings + turbopuffer. finds conceptually similar content even without shared words (~345ms). 45- **hybrid**: runs both keyword and semantic in parallel, merges via reciprocal rank fusion. best quality, slightly slower (~360ms). 46 47## ranking 48 49keyword results are ranked by `BM25 + recency`: 50- BM25 scores term frequency and document length (standard IR ranking) 51- recency adds a small boost for newer documents: `rank + (days_old / 30)` 52 53## tokenization 54 55the FTS5 unicode61 tokenizer treats any non-alphanumeric character as a separator. this means: 56- `crypto-casino` → matches "crypto" and "casino" separately 57- `don't` → matches "don" and "t" 58- `foo.bar` → matches "foo" and "bar"