semantic bufo search find-bufo.com
bufo
1# find-bufo 2 3hybrid semantic + keyword search for the bufo zone 4 5**live at: [find-bufo.fly.dev](https://find-bufo.fly.dev/)** 6 7## overview 8 9a one-page application for searching through all the bufos from [bufo.zone](https://bufo.zone/) using hybrid search that combines: 10- **semantic search** via multimodal embeddings (understands meaning and visual content) 11- **keyword search** via BM25 full-text search (finds exact filename matches) 12 13## architecture 14 15- **backend**: rust (actix-web) 16- **frontend**: vanilla html/css/js 17- **embeddings**: voyage ai voyage-multimodal-3 18- **vector store**: turbopuffer 19- **deployment**: fly.io 20 21## setup 22 231. install dependencies: 24 - rust toolchain 25 - python 3.11+ with uv 26 272. copy environment variables: 28 ```bash 29 cp .env.example .env 30 ``` 31 323. set your api keys in `.env`: 33 - `VOYAGE_API_TOKEN` - for generating embeddings 34 - `TURBOPUFFER_API_KEY` - for vector storage 35 36## ingestion 37 38to populate the vector store with bufos: 39 40```bash 41just re-index 42``` 43 44this will: 451. scrape all bufos from bufo.zone 462. download them to `data/bufos/` 473. generate embeddings for each image with `input_type="document"` 484. upload to turbopuffer 49 50## development 51 52run the server locally: 53 54```bash 55cargo run 56``` 57 58the app will be available at `http://localhost:8080` 59 60## deployment 61 62deploy to fly.io: 63 64```bash 65fly launch # first time 66fly secrets set VOYAGE_API_TOKEN=your_token 67fly secrets set TURBOPUFFER_API_KEY=your_key 68just deploy 69``` 70 71## usage 72 731. open the app 742. enter a search query describing the bufo you want 753. see the top matching bufos with hybrid similarity scores 764. click any bufo to open it in a new tab 77 78### api parameters 79 80the search API supports these parameters: 81- `query`: search text (required) 82- `top_k`: number of results (default: 10) 83- `alpha`: fusion weight (default: 0.7) 84 - `1.0` = pure semantic (best for conceptual queries like "happy", "apocalyptic") 85 - `0.7` = default (balances semantic understanding with exact matches) 86 - `0.5` = balanced (equal weight to both signals) 87 - `0.0` = pure keyword (best for exact filename searches) 88 89example: `/api/search?query=jumping&top_k=5&alpha=0.5` 90 91## how it works 92 93### ingestion 94all bufo images are processed through early fusion multimodal embeddings: 951. filename text extracted (e.g., "bufo-jumping-on-bed" → "bufo jumping on bed") 962. combined with image content in single embedding request 973. voyage-multimodal-3 creates 1024-dim vectors capturing both text and visual features 984. uploaded to turbopuffer with BM25-enabled `name` field for keyword search 99 100### search 1011. **semantic branch**: query embedded using voyage-multimodal-3 with `input_type="query"` 1022. **keyword branch**: BM25 full-text search against bufo names 1033. **fusion**: weighted combination using `alpha` parameter 104 - `score = α * semantic + (1-α) * keyword` 105 - both scores normalized to 0-1 range before fusion 1064. **ranking**: results sorted by fused score, top_k returned 107 108### why hybrid? 109- semantic alone: misses exact filename matches (e.g., "happy" might not find "bufo-is-happy") 110- keyword alone: no semantic understanding (e.g., "happy" won't find "excited" or "smiling") 111- hybrid: gets the best of both worlds