audio streaming app plyr.fm

genre classification#

ML-based genre classification for tracks using the effnet-discogs model on Replicate.

architecture#

track uploaded (or on-demand request)
        |
        v
  Replicate API
  (effnet-discogs, CPU, ~2s)
        |
        v
  genre predictions JSON
  (top N labels + confidence scores)
        |
        v
  stored in track.extra["genre_predictions"]

how it works#

  1. audio URL is sent to Replicate's effnet-discogs model
  2. model returns genre labels from the Discogs taxonomy with confidence scores
  3. raw labels use Genre---Subgenre format (e.g., Electronic---Ambient) — we split these into separate tags (electronic, ambient) and deduplicate, keeping the highest confidence score
  4. predictions are stored in track.extra["genre_predictions"] as JSON

when classification runs#

  • on upload: if REPLICATE_ENABLED=true, classification is scheduled as a docket background task after upload
  • on demand: GET /tracks/{id}/recommended-tags classifies on the fly if no stored predictions exist
  • backfill: scripts/backfill_genres.py processes existing tracks

API#

GET /tracks/{track_id}/recommended-tags?limit=5#

no auth required. returns genre predictions for a track, excluding tags the track already has.

response:

{
  "track_id": 668,
  "tags": [
    {"name": "audiobook non-music", "score": 0.2129},
    {"name": "spoken word non-music", "score": 0.1817},
    {"name": "monolog non-music", "score": 0.1227}
  ],
  "available": true
}
  • available: false when Replicate is disabled
  • empty tags with available: true means the track has no R2 URL or classification returned no results
  • score is the model's confidence (0-1)

auto-tag at upload#

users can check "auto-tag with recommended genres" on the upload form. when enabled:

  1. auto_tag: true is stored in track.extra during upload
  2. classify_genres background task runs as usual
  3. after storing predictions, the task checks for the auto_tag flag
  4. applies top genre tags using ratio-to-top filter: tags scoring >= 50% of the top score, capped at 5
  5. cleans up the auto_tag flag from track.extra

auto-tags are additive with manual tags — if the user also typed tags, both appear on the track.

key files: backend/src/backend/api/tracks/uploads.py (form param + UploadContext), backend/src/backend/_internal/tasks/ml.py (apply logic in classify_genres)

auditing#

scripts/ml_audit.py reports which tracks and artists have been processed by genre classification (and other ML features). useful for privacy/ToS auditing.

cd backend && uv run python ../scripts/ml_audit.py --verbose

storage format#

predictions are stored in track.extra["genre_predictions"]:

[
  {"name": "electronic", "confidence": 0.1999},
  {"name": "ambient", "confidence": 0.1999},
  {"name": "experimental", "confidence": 0.1673},
  {"name": "synth-pop", "confidence": 0.122}
]

once classified, the predictions are cached — subsequent API requests read from the database without calling Replicate again.

cache invalidation#

predictions are keyed by genre_predictions_file_id (stored alongside predictions in track.extra). when a track's audio file is replaced, the file_id changes and the cached predictions are discarded — the next request triggers reclassification.

{
  "genre_predictions": [...],
  "genre_predictions_file_id": "a1b2c3d4e5f67890"
}

backfill#

# dry run — shows eligible tracks (missing genre_predictions in extra)
uv run scripts/backfill_genres.py --dry-run

# classify first 5 tracks
uv run scripts/backfill_genres.py --limit 5

# full backfill with custom concurrency
uv run scripts/backfill_genres.py --concurrency 10

requires env vars: DATABASE_URL, REPLICATE_ENABLED=true, REPLICATE_API_TOKEN.

environment variables#

variable purpose default
REPLICATE_ENABLED enable genre classification false
REPLICATE_API_TOKEN Replicate API token
REPLICATE_TOP_N number of predictions to keep 10
REPLICATE_TIMEOUT_SECONDS request timeout 120

cost#

effnet-discogs runs on CPU at $0.00019/run ($0.11 per 575 tracks). Replicate scales to zero when idle.

model details#

  • model: mtg/effnet-discogs (EfficientNet trained on Discogs)
  • taxonomy: Discogs genre/subgenre labels (~400 categories)
  • inference: CPU, ~2s per track
  • SDK note: the replicate Python SDK is incompatible with Python 3.14 (pydantic v1 dependency). we use httpx directly against the Replicate HTTP API with Prefer: wait for synchronous predictions.

frontend UX#

when editing a track on the portal page, the edit modal fetches recommended tags and displays them as clickable dashed-border chips below the tag input. clicking a chip adds the tag. uses $derived to reactively hide suggestions that match manually-typed tags.

  • loading state: WaveLoading size="sm" with "suggested" label
  • silent failures: section hidden if replicate disabled, fetch fails, or no predictions
  • implemented in: frontend/src/routes/portal/+page.svelte

key files#

  • backend/src/backend/_internal/clients/replicate.py — Replicate HTTP client
  • backend/src/backend/_internal/tasks/ml.pyclassify_genres task (+ auto-tag logic)
  • backend/src/backend/api/tracks/uploads.pyauto_tag form param and UploadContext
  • backend/src/backend/api/tracks/tags.pyrecommended-tags endpoint
  • backend/src/backend/config.pyReplicateSettings
  • scripts/backfill_genres.py — batch classification script
  • scripts/ml_audit.py — ML processing audit script
  • frontend/src/routes/upload/+page.svelte — auto-tag checkbox on upload form
  • frontend/src/routes/portal/+page.svelte — suggested tags UI in edit modal