semantic bufo search
find-bufo.com
bufo
1# find-bufo
2
3hybrid semantic + keyword search for the bufo zone
4
5**live at: [find-bufo.fly.dev](https://find-bufo.fly.dev/)**
6
7## overview
8
9a one-page application for searching through all the bufos from [bufo.zone](https://bufo.zone/) using hybrid search that combines:
10- **semantic search** via multimodal embeddings (understands meaning and visual content)
11- **keyword search** via BM25 full-text search (finds exact filename matches)
12
13## architecture
14
15- **backend**: rust (actix-web)
16- **frontend**: vanilla html/css/js
17- **embeddings**: voyage ai voyage-multimodal-3
18- **vector store**: turbopuffer
19- **deployment**: fly.io
20
21## setup
22
231. install dependencies:
24 - rust toolchain
25 - python 3.11+ with uv
26
272. copy environment variables:
28 ```bash
29 cp .env.example .env
30 ```
31
323. set your api keys in `.env`:
33 - `VOYAGE_API_TOKEN` - for generating embeddings
34 - `TURBOPUFFER_API_KEY` - for vector storage
35
36## ingestion
37
38to populate the vector store with bufos:
39
40```bash
41just re-index
42```
43
44this will:
451. scrape all bufos from bufo.zone
462. download them to `data/bufos/`
473. generate embeddings for each image with `input_type="document"`
484. upload to turbopuffer
49
50## development
51
52run the server locally:
53
54```bash
55cargo run
56```
57
58the app will be available at `http://localhost:8080`
59
60## deployment
61
62deploy to fly.io:
63
64```bash
65fly launch # first time
66fly secrets set VOYAGE_API_TOKEN=your_token
67fly secrets set TURBOPUFFER_API_KEY=your_key
68just deploy
69```
70
71## usage
72
731. open the app
742. enter a search query describing the bufo you want
753. see the top matching bufos with hybrid similarity scores
764. click any bufo to open it in a new tab
77
78### api parameters
79
80the search API supports these parameters:
81- `query`: search text (required)
82- `top_k`: number of results (default: 10)
83- `alpha`: fusion weight (default: 0.7)
84 - `1.0` = pure semantic (best for conceptual queries like "happy", "apocalyptic")
85 - `0.7` = default (balances semantic understanding with exact matches)
86 - `0.5` = balanced (equal weight to both signals)
87 - `0.0` = pure keyword (best for exact filename searches)
88
89example: `/api/search?query=jumping&top_k=5&alpha=0.5`
90
91## how it works
92
93### ingestion
94all bufo images are processed through early fusion multimodal embeddings:
951. filename text extracted (e.g., "bufo-jumping-on-bed" → "bufo jumping on bed")
962. combined with image content in single embedding request
973. voyage-multimodal-3 creates 1024-dim vectors capturing both text and visual features
984. uploaded to turbopuffer with BM25-enabled `name` field for keyword search
99
100### search
1011. **semantic branch**: query embedded using voyage-multimodal-3 with `input_type="query"`
1022. **keyword branch**: BM25 full-text search against bufo names
1033. **fusion**: weighted combination using `alpha` parameter
104 - `score = α * semantic + (1-α) * keyword`
105 - both scores normalized to 0-1 range before fusion
1064. **ranking**: results sorted by fused score, top_k returned
107
108### why hybrid?
109- semantic alone: misses exact filename matches (e.g., "happy" might not find "bufo-is-happy")
110- keyword alone: no semantic understanding (e.g., "happy" won't find "excited" or "smiling")
111- hybrid: gets the best of both worlds