SAM-Audio: Audio Source Separation Model
What is SAM-Audio?
SAM-Audio is a foundation AI model from Meta designed for audio source separation: isolating a target sound (like “drums”, “speech”, or “siren”) from a mixed recording. Instead of separating only fixed stems, SAM-Audio aims to segment any sound you describe, making it useful for modern audio editing pipelines, sound event detection, and multimedia analysis.
On Segmind, you provide an audio input and a sound description prompt. The model returns an isolated track containing the requested source, enabling workflows like “extract vocals from a song”, “remove background noise”, or “pull out footsteps from a scene”.
Key Features
- •Prompted sound isolation using natural language text (e.g., “keyboard typing”, “female narration”).
- •Fine-grained separation for complex mixes (music, ambience, dialogue + effects).
- •Developer-friendly inputs: audio via URL or Base64, with selectable output format.
- •Quality tuning with reranking to improve the best-candidate separation.
- •Strong fit for automated pipelines (moderation, indexing, annotation, post-production).
Best Use Cases
- •Audio editing & post-production: isolate dialogue, ambience, SFX, instruments.
- •Content creation: remixing, stem-like extraction, cleaner voiceovers.
- •Sound event detection: extract target events before classification or labeling.
- •Multimedia & video analysis: separate scene sounds for search and retrieval.
- •Accessibility: enhance speech tracks for transcription and captioning.
Prompt Tips and Output Quality
- •Be specific: “snare drum hits” often separates better than “drums”.
- •Include context: “crowd cheering in a stadium” vs. “cheering”.
- •If multiple similar sources exist, add qualifiers: “lead vocal”, “background chatter”.
- •Use
output_format: wavfor highest fidelity;mp3for smaller files. - •Increase
reranking_candidates(1–8) when the separation is close but imperfect; higher values typically improve selection at the cost of more computation.
Core parameters
- •
audio(required): URL/Base64 for the input audio. - •
description(required): the sound to isolate. - •
output_format(optional):wavormp3(defaultwav). - •
reranking_candidates(optional, advanced): candidate count (default4).
FAQs
Is SAM-Audio open-source?
Meta publishes research assets for SAM-Audio, but licensing and usage terms may vary by distribution. Check the upstream repository/terms for your deployment scenario.
How is SAM-Audio different from stem splitters (vocals/drums/bass)?
It’s prompt-driven: you can target any described sound, not only fixed music stems.
What should I put in description for best results?
Use a concise noun phrase plus qualifiers (instrument, source, environment), e.g., “male speech in a car”, “dog barking”, “hi-hat pattern”.
Should I choose WAV or MP3 output?
Choose WAV for editing and evaluation; choose MP3 for lightweight previews and distribution.
What does reranking_candidates do?
It controls how many separation candidates are generated and reranked; increasing it can improve the final isolated track when prompts are ambiguous.