The Lab
1. Methodology
Telling AI-generated music from human-performed music is not a solved problem in 2026. Rather than pretending otherwise, we split the task into three pillars and tell the user which one produced their verdict:
1. Platform identification
Given a URL from an AI-only platform (Suno, Udio, ElevenLabs, AIVA, Soundraw, Mubert, Boomy, Loudly, Beatoven), the app returns an instant 100%-certain verdict. Those platforms only host AI output — no model needed.
2. Audio-signal model
For anything without a platform URL — Apple Music links and mic scans — audio is sent to our Hugging Face Space. Today that runs an off-the-shelf speech-trained classifier; the v2 replacement below is in training.
3. Stem-level analysis
For ambiguous verdicts we’ll separate audio into vocals / drums / bass / other and run the detector on each stream. Research-only today — see the stem-separation hypothesis note below.
2. Model versions
Every call to the detector goes through one of these:
v1 · production (current)
Off-the-shelf mo-thecreator/Deepfake-audio-detection
— a Wav2Vec2 base fine-tuned on speech-deepfake datasets.
Limit we’re honest about: this model was trained to distinguish TTS / voice-cloned speech from real speech. On music, it skews heavily toward the “AI” class — Kanye tracks, Queen tracks, anything with vocals over live instruments tends to score around 0.7. That’s flagged in the app on every experimental result.
v2 · in training
Our own music-native detector, fine-tuned from MERT-v1-95M — a self-supervised music encoder pre-trained on 160 000 hours of music.
v3 · planned
Stem separation (via Spleeter or Open-Unmix, CPU-only for free HF Spaces) + the v2 detector run on each stem. Roadmap after v2 ships and we see where it’s weakest.
3. Where we are right now
What that 0.003 means: across a 7-clip benchmark (Queen, Coldplay, Michael Jackson, Ed Sheeran, The Weeknd, Eagles, Suno-generated track) the mean wav2vec2 score on AI clips was 1.000 and on real clips was 0.997 — a difference of 0.003. The model isn’t discriminating; it is simply saying “AI” to every piece of music with high confidence. Replacing this is the single most valuable thing we can do, and it’s the focus of the v2 work.
4. Lab notes
Reverse-chronological. No cadence commitment — posted when something changes worth recording.
Refactored AdConfig so debug builds always serve
Google’s canonical test creatives and release builds
serve real production IDs — automatic via
kReleaseMode. Also registered test-device hashes
to protect the AdMob account from self-click invalid traffic
when dogfooding release builds.
TestFlight surfaced the backend hammer. 18-track albums were firing 18 sequential detector calls, tripping the 5/min rate limit and cascading into “null” errors. Now we auto-scan the first 3 tracks and mark the rest as “not scanned” with a clear explainer banner. Saves the user’s daily free-scan quota too.
The “AI” sits literally inside the name — “revAIl”. More honest about the research-in-progress positioning than “CheckAI”, which implied a finished consumer product we don’t yet have. Logo, splash, store listing, and this website moved over in one pass.
Ran a 7-clip benchmark against two popular Hugging Face deepfake models — both returned the same ~1.0 score on Suno-generated audio and on Queen’s Bohemian Rhapsody. Separation 0.000. This is why we committed to training our own music-native classifier rather than picking a different existing model.
Pasted Suno / Udio / ElevenLabs links now return an instant 100% verdict without touching the audio model or the backend. These platforms only host AI content, so the audio-path inference was waste. Speeds up verdicts and preserves the user’s daily scan quota for ambiguous cases.
5. Open questions we’re thinking about
-
Is per-stem analysis worth the CPU? Spleeter
runs around 10 seconds per clip on the HF free tier.
Architecture sketched in
backend/but not wired up until v2 proves itself. - How do we stay ahead of new AI generators? A model trained on Suno / Udio / ElevenLabs output today will be outdated the moment a new generator lands with a different artefact signature. Looking at continuous retraining loops and user-submitted fixtures.
- Is there a signature-based approach we’re missing? Some AI generators embed metadata or inaudible watermarks. We skipped these because they’re easy to strip, but a hybrid (watermark + audio model) may deserve revisiting.
6. How to reach us
If you’re researching this problem space or have a corpus of labelled AI-music clips we could partner on: msquaregiza@gmail.com.
RevailLab