One key. Every media model.

The AI Gateway
for Media Models.

One API for image, video, and audio across every major lab.
Automatic failover, BYOK, list-price billing. No markup.

segmind-gateway

Modality

curl -X POST "https://api.segmind.com/v1/nano-banana-pro" \
  -H "Authorization: Bearer $SEGMIND_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "an editorial portrait, golden hour"
  }'

One endpoint. Swap nano-banana-pro for any of 500+ media models.

Routing500+ models · 40+ providers

Seedream 5.0

Veo 3.1

Kling 3.0

Flux Pro

GPT Image 1.5

ElevenLabs v3

Runway Gen-4

Seedance 2.0

Segmind

Gateway

Live

The Catalog

500+ media models.
One endpoint.

Every major lab. Every modality. Image, video, audio, voice. Drop in a new model by changing one string.

Nano Banana

Nano Banana Pro

Seedream 4.5

GPT Image 2

GPT Image 1.5

Seedream 4.0

Three things you stop building.
The day you switch.

One key, every media model

Stop wrangling 40+ providers.

One API key, one billing relationship, one set of credentials. Image, video, audio, voice. Every major lab behind a single endpoint with consistent request and response shapes.

Automatic failover

When a provider goes dark, your app doesn’t.

Configure a fallback chain across providers. If a primary model is rate-limited, regionally degraded, or returning errors, the gateway retries on the next model, without you touching code.

No platform fees

Pay exactly what providers charge.

No gateway tax, no per-request surcharge, no inflated token rates. Per-generation cost is itemized in the dashboard, down to the second of video.

Media-Native Infrastructure

The things LLM gateways
weren’t built to do.

Async jobs, built in

Long generations are first-class.

Five-minute video generations don’t fit the LLM token-streaming model. The gateway treats every job as durable: poll, webhook, or stream progress updates. No request timeouts.

Job · gw_8f2a1c0erunning · 4m12s

queued

0.4s

generating · veo-3.1

3m 48s

uploading · S3

—

S3-backed delivery

Outputs land in object storage.

Generated images and videos are uploaded to durable storage with signed URLs. No 30 MB base64 blobs in your response payloads.

output_url:
https://cdn.segmind.com/generations/gw_8f2a1c0e/video.mp4

Webhooks

Fire-and-forget jobs.

Pass a callback URL. Signed delivery on completion, automatic retry with backoff.

POSTyour-app.com/hooks200 OK

{
  "event": "job.succeeded",
  "id": "gw_8f2a1c0e",
  "output_url": ".../video.mp4"
}

Per-generation cost

Every generation, priced.

Itemized down to the second of video. See what each model actually costs you.

5s · 1080plast 24h

Seedance 2.0

$0.36

Veo 3.1

$1.20

Kling 3.0 Pro

$0.54

BYOK at 0%

Bring your own keys.

Plug in existing provider keys. We route, observe, and bill at zero markup.

Connected providers0% markup

OOpenAI

sk-•••a7e2

GGoogle

AIza•••82f

RRunway

run-•••3d1

EElevenLabs

el-•••9ce

Rate-limit smoothing

No more provider 429s.

The gateway queues, throttles, and re-routes around provider rate limits. Your app sees a steady-state API, not provider edge cases.

Steady 60 RPM, smoothed across 3 providers

Observability

Every generation, fully traced.

Input, provider, output, with cost, latency, and model version pinned. What LLM gateways pioneered, extended for media: prompt, seed, output URL, p95.

Trace · gw_8f2a1c0e8.4s · $1.20

input"cinematic shot, low golden light, 35mm…"prompt · 47 tok

providerveo-3.1v2025.10 · us-east

outputvideo.mp4s3 · 8.2 MB

How Routing Works

Provider outages
are someone else’s problem.

STEP 01

Primary

Your call hits the model you specified. In the happy path, that’s where it ends.

STEP 02

Fallback chain

If the primary is rate-limited, regionally degraded, or returning errors, we retry across your configured fallback chain (e.g., Veo → Kling → Runway).

STEP 03

BYOK or credits

Each hop uses your own provider key if you’ve plugged one in. Otherwise we use Segmind credits at the same list price.

Example config

{
  "model": "veo-3.1",
  "fallback": ["kling-3.0-pro", "runway-gen-4"],
  "byok": { "google": "gky_…", "runway": "rwk_…" }
}

Observability

See every generation.
Cost, latency, output, in one place.

Gateway Dashboard· last 30 days

Live

Generations

2.9M

+12.5% vs last month

Avg Latency

14.2s

all models

Fastest Model

0.8s

embedding · avg

By model

ModelGenerationsShareAvg Latency

nano-banana2,155,95174.1%11.8s

faceswap-v557,0382.0%8.3s

segmind-vega52,5531.8%2.2s

face-to-many37,3021.3%9.1s

seedance-1.5-pro36,9271.3%77.9s

Filter by model, route, fallback hop, customer-id. Export to CSV or stream to your warehouse.

Positioning

Built for media.
Not just text.

Generic LLM gateways are great for chat. The shape of a media workload, multi-minute generations, multi-gigabyte outputs, per-second billing, is a different problem.

CapabilitySegmind (media)Generic LLM gateway

Long-running generations (1–5 min)KEY

Async job, durable

Times out

Output delivery

Signed S3 URLs

Inline tokens

Per-second video pricingKEY

Native, itemized

Token-based only

Webhook callbacks

Job-level events

Streaming chunks

Image / video / audio in one APIKEY

Yes, native

Text-first

Token streaming for chat

Not applicable

First-class

Prompt caching for chat

Not applicable

First-class

Provider failover

Across labs and models

Across models

BYOK at 0% markup

Yes

Varies

If your workload is chat, an LLM gateway is the right choice. If you’re shipping image, video, or audio at scale, you’ll save a lot of glue code by starting here.

Who builds on Segmind

One gateway.
Every team shipping AI media.

From two-person startups to platforms serving millions. Same API, different scale.

STARTUPS

Ship the feature, not the plumbing.

One API, every frontier model across image, video, and audio. Skip the integration tax, the cold starts, the per-lab contracts. Go from idea to production in an afternoon, and let the gateway handle scale when you get there.

See the developer story

Models

500+

Time to first call

< 5 min

Billing

Pay as you go

Pricing

Pay exactly what providers charge. No platform fees.

Provider list price

No platform fees

No request fees

Pay as you go

See full pricing Read the docs

FAQ

Frequently Asked Questions

Native SDKs for Node.js and Python, plus a clean HTTP API that works from cURL, Go, Rust, or anywhere else. The request shape is consistent across image, video, audio, and voice models.

Yes. Short generations return inline. Long generations (videos over a few seconds, batch jobs, anything past a request timeout) run as durable jobs you can poll or receive via webhook. Progress events are streamed where the underlying model supports it.

Plug your provider keys (OpenAI, Google, Runway, ElevenLabs, BFL, and others) into the gateway. We route requests using your keys at zero markup. You keep your provider relationship and any negotiated rates, and we add routing, failover, and observability on top.

The gateway runs in multiple regions (US, EU, APAC). You can pin requests to a region, prefer the lowest-latency region, or let the gateway pick based on provider availability.

Plan-based rate limits sit in front of the gateway. Provider-side rate limits are smoothed: when a provider returns 429, the gateway retries on your fallback chain rather than passing the error through.

By default, outputs are stored on Segmind-managed S3 with signed URLs and configurable retention. You can also ship outputs directly to your own bucket via a per-key delivery destination.

Per-request logs with the model used, fallback hops, provider response, cost, latency, and the signed output URL. Filter by model, customer-id, or hop. Export to CSV or stream to a warehouse.

Pro and Business plans include best-effort uptime. Scale and Enterprise plans include a contractual uptime SLA and a path to dedicated capacity for the highest-traffic accounts.

Available now on Segmind

One API key.
Every media model.

Start building in under five minutes. No contract, list-price billing, pay as you go.

Get an API Key Talk to Sales

Segmind is an AI infrastructure platform with 500+ media models from 40+ providers.
Built for image, video, audio, and voice workloads at production scale.

The AI Gatewayfor Media Models.

500+ media models.One endpoint.

Three things you stop building.The day you switch.