Sam Audio Large

Isolate any described sound from mixed audio tracks.

~11.78s
~$0.062

Inputs

Audio file URL or base64 to separate into components.

Describe the sound to isolate, e.g., drums or vocals.

Output audio format. WAV for quality, MP3 for smaller size.

Examples

0:00 / 0:00
--

SAM-Audio: Audio Source Separation Model

What is SAM-Audio?

SAM-Audio is a foundation AI model from Meta designed for audio source separation: isolating a target sound (like “drums”, “speech”, or “siren”) from a mixed recording. Instead of separating only fixed stems, SAM-Audio aims to segment any sound you describe, making it useful for modern audio editing pipelines, sound event detection, and multimedia analysis.

On Segmind, you provide an audio input and a sound description prompt. The model returns an isolated track containing the requested source, enabling workflows like “extract vocals from a song”, “remove background noise”, or “pull out footsteps from a scene”.

Key Features

  • Prompted sound isolation using natural language text (e.g., “keyboard typing”, “female narration”).
  • Fine-grained separation for complex mixes (music, ambience, dialogue + effects).
  • Developer-friendly inputs: audio via URL or Base64, with selectable output format.
  • Quality tuning with reranking to improve the best-candidate separation.
  • Strong fit for automated pipelines (moderation, indexing, annotation, post-production).

Best Use Cases

  • Audio editing & post-production: isolate dialogue, ambience, SFX, instruments.
  • Content creation: remixing, stem-like extraction, cleaner voiceovers.
  • Sound event detection: extract target events before classification or labeling.
  • Multimedia & video analysis: separate scene sounds for search and retrieval.
  • Accessibility: enhance speech tracks for transcription and captioning.

Prompt Tips and Output Quality

  • Be specific: “snare drum hits” often separates better than “drums”.
  • Include context: “crowd cheering in a stadium” vs. “cheering”.
  • If multiple similar sources exist, add qualifiers: “lead vocal”, “background chatter”.
  • Use output_format: wav for highest fidelity; mp3 for smaller files.
  • Increase reranking_candidates (1–8) when the separation is close but imperfect; higher values typically improve selection at the cost of more computation.

Core parameters

  • audio (required): URL/Base64 for the input audio.
  • description (required): the sound to isolate.
  • output_format (optional): wav or mp3 (default wav).
  • reranking_candidates (optional, advanced): candidate count (default 4).

FAQs

Is SAM-Audio open-source?
Meta publishes research assets for SAM-Audio, but licensing and usage terms may vary by distribution. Check the upstream repository/terms for your deployment scenario.

How is SAM-Audio different from stem splitters (vocals/drums/bass)?
It’s prompt-driven: you can target any described sound, not only fixed music stems.

What should I put in description for best results?
Use a concise noun phrase plus qualifiers (instrument, source, environment), e.g., “male speech in a car”, “dog barking”, “hi-hat pattern”.

Should I choose WAV or MP3 output?
Choose WAV for editing and evaluation; choose MP3 for lightweight previews and distribution.

What does reranking_candidates do?
It controls how many separation candidates are generated and reranked; increasing it can improve the final isolated track when prompts are ambiguous.