title: "mood search"#
semantic search over tracks using CLAP (Contrastive Language-Audio Pretraining) embeddings. users describe a mood in natural language and get matching tracks ranked by audio similarity.
architecture#
user types "dark ambient techno"
│
▼
Modal CLAP service turbopuffer
(embed_text) ──► (vector ANN query)
│ │
▼ ▼
512-dim text embedding top-k similar tracks
│
▼
hydrate from postgres
(Track + Artist join)
how search works#
keyword and semantic search fire in parallel from the frontend:
- keyword (150ms debounce, min 2 chars) — standard pg_trgm fuzzy match against track titles, artist names, albums, tags, playlists
- semantic (500ms debounce, min 3 chars, feature-flagged) — CLAP text embedding → turbopuffer ANN query → hydrate from postgres
keyword results appear instantly. semantic results append when ready. client-side deduplication removes any semantic tracks that already appeared in keyword results.
components#
| component | service | purpose |
|---|---|---|
| CLAP model | Modal (plyr-clap) |
generates 512-dim embeddings from audio or text |
| vector store | turbopuffer | stores/queries embeddings with cosine distance |
| search endpoint | FastAPI (/search/semantic) |
orchestrates embed → query → hydrate |
| backfill script | scripts/backfill_embeddings.py |
indexes existing tracks |
services#
Modal CLAP service#
source: services/clap/app.py
hosts the laion/clap-htsat-unfused model on Modal with two endpoints:
POST /embed_audio— base64-encoded audio → 512-dim embeddingPOST /embed_text— text description → 512-dim embedding
container specs: 2 CPU, 4GB RAM, 5-minute idle timeout (scales to zero).
model choice: we use
clap-htsat-unfused, notlarger_clap_music. the "larger" music variant has a broken text projection layer (zero biases, near-zero weights) that collapses all text to the same embedding.clap-htsat-unfusedproduces properly discriminative text embeddings (cosine similarity range -0.01 to 0.50 across different queries).
deploy:
uvx modal deploy services/clap/app.py
after deploy, Modal prints the endpoint URLs. set them as Fly secrets:
fly secrets set \
MODAL_ENABLED=true \
MODAL_EMBED_AUDIO_URL=https://<workspace>--plyr-clap-clapservice-embed-audio.modal.run \
MODAL_EMBED_TEXT_URL=https://<workspace>--plyr-clap-clapservice-embed-text.modal.run \
-a <fly-app-name>
test:
curl -X POST https://<workspace>--plyr-clap-clapservice-embed-text.modal.run \
-H "Content-Type: application/json" \
-d '{"text": "dark ambient techno"}'
# → {"embedding": [...], "dimensions": 512}
first call is slow (~30s) due to cold start + model download. subsequent calls within 5 minutes are fast.
turbopuffer#
vector database for storing and querying track embeddings.
env vars:
TURBOPUFFER_ENABLED=true
TURBOPUFFER_API_KEY=tpuf_xxxxx
TURBOPUFFER_NAMESPACE=plyr-tracks # prod: plyr-tracks, staging: plyr-tracks-stg
namespace naming: the backfill script writes to whatever
TURBOPUFFER_NAMESPACEis set to. production usesplyr-tracks(notplyr-tracks-prd). if you change this, you must re-backfill.
each vector stores:
id: track_idvector: 512-dim CLAP embedding- attributes:
title,artist_handle,artist_did
the namespace is created automatically on first write. querying a nonexistent namespace returns empty results (not an error).
indexing#
automatic (new uploads)#
when a track is uploaded and both Modal + turbopuffer are enabled, embedding generation is automatically scheduled as a docket background task. see backend/src/backend/_internal/tasks/ml.py → generate_embedding.
backfill (existing tracks)#
# dry run — shows eligible tracks
uv run scripts/backfill_embeddings.py --dry-run
# index first 5 tracks
uv run scripts/backfill_embeddings.py --limit 5
# full backfill with 10 concurrent workers (default)
uv run scripts/backfill_embeddings.py
# faster with more concurrency (Modal auto-scales containers)
uv run scripts/backfill_embeddings.py --concurrency 20
the script downloads audio from R2, generates CLAP embeddings via Modal, and upserts to turbopuffer. uses asyncio concurrency with a semaphore — Modal spins up multiple containers automatically.
requires env vars: DATABASE_URL, R2_PUBLIC_BUCKET_URL, MODAL_*, TURBOPUFFER_*.
important: if you switch CLAP models, you must re-backfill all tracks. text and audio embeddings must be in the same latent space.
feature flag#
mood search is gated behind the vibe-search per-user feature flag. users with the flag get semantic results appended below keyword results in the search modal. without the flag, only keyword search runs.
cost#
| service | backfill (575 tracks) | ongoing per upload | monthly storage |
|---|---|---|---|
| Modal (CLAP inference) | ~$0.35 | < $0.001 | $0 (scales to zero) |
| turbopuffer (vectors) | — | — | < $0.01 |
| R2 (audio download) | $0 (free egress) | $0 | — |
environment variables#
| variable | purpose | default |
|---|---|---|
MODAL_ENABLED |
enable CLAP embedding service | false |
MODAL_EMBED_AUDIO_URL |
Modal endpoint for audio embeddings | — |
MODAL_EMBED_TEXT_URL |
Modal endpoint for text embeddings | — |
MODAL_TIMEOUT_SECONDS |
embedding request timeout | 120 |
TURBOPUFFER_ENABLED |
enable vector storage | false |
TURBOPUFFER_API_KEY |
turbopuffer API key | — |
TURBOPUFFER_REGION |
turbopuffer API region | api |
TURBOPUFFER_NAMESPACE |
vector namespace | plyr-tracks |
API#
GET /search/semantic?q=<query>&limit=10#
no auth required. returns tracks ranked by cosine similarity to the query text.
requirements: Modal + turbopuffer enabled, query 3-200 chars, limit 1-50.
response:
{
"results": [
{
"type": "track",
"id": 123,
"title": "Ambient Waves",
"artist_handle": "artist.bsky.social",
"artist_display_name": "Artist Name",
"image_url": "https://...",
"similarity": 0.483
}
],
"query": "dark ambient techno",
"available": true
}
available: false when Modal/turbopuffer are disabled or embedding fails. empty results with available: true means no tracks matched.
GET /tracks/{track_id}/recommended-tags?limit=5#
no auth required. recommends genre tags for a track based on ML classification via effnet-discogs on Replicate.
results are cached in track.extra["genre_predictions"]. if no predictions are stored and Replicate is enabled, classification runs on-demand. excludes tags the track already has.
response:
{
"track_id": 76,
"tags": [
{"name": "Techno", "score": 0.87},
{"name": "Electronic", "score": 0.72}
],
"available": true
}
available: false when Replicate is disabled. empty tags with available: true means classification returned no results or the track has no R2 URL.
key files#
services/clap/app.py— Modal CLAP servicebackend/src/backend/_internal/clients/clap.py— Modal HTTP clientbackend/src/backend/_internal/clients/tpuf.py— turbopuffer clientbackend/src/backend/api/search.py—/search/semanticendpointbackend/src/backend/_internal/tasks/ml.py—generate_embeddingtaskscripts/backfill_embeddings.py— batch indexing scriptfrontend/src/lib/components/SearchModal.svelte— search UI (mood badge, progressive results)frontend/src/lib/search.svelte.ts— parallel keyword + semantic dispatch