One key. Every media model.

The AI Gateway
for Media Models.

One API for image, video, and audio across every major lab.Automatic failover, BYOK, list-price billing. No markup.

segmind-gateway
Modality
curl -X POST "https://api.segmind.com/v1/nano-banana-pro" \
  -H "Authorization: Bearer $SEGMIND_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "an editorial portrait, golden hour"
  }'
One endpoint. Swap nano-banana-pro for any of 500+ media models.
Routing500+ models · 40+ providers
Seedream 5.0
Veo 3.1
Kling 3.0
Flux Pro
GPT Image 1.5
ElevenLabs v3
Runway Gen-4
Seedance 2.0

Segmind

Gateway

Live

Why a Gateway

Three things you stop building.
The day you switch.

One key, every media model

Stop wrangling 40+ providers.

One API key, one billing relationship, one set of credentials. Image, video, audio, voice. Every major lab behind a single endpoint with consistent request and response shapes.

Automatic failover

When a provider goes dark, your app doesn’t.

Configure a fallback chain across providers. If a primary model is rate-limited, regionally degraded, or returning errors, the gateway retries on the next model, without you touching code.

No platform fees

Pay exactly what providers charge.

No gateway tax, no per-request surcharge, no inflated token rates. Per-generation cost is itemized in the dashboard, down to the second of video.

Media-Native Infrastructure

The things LLM gateways
weren’t built to do.

Async jobs, built in

Long generations are first-class.

Five-minute video generations don’t fit the LLM token-streaming model. The gateway treats every job as durable: poll, webhook, or stream progress updates. No request timeouts.

Job · gw_8f2a1c0erunning · 4m12s
queued
0.4s
generating · veo-3.1
3m 48s
uploading · S3
S3-backed delivery

Outputs land in object storage.

Generated images and videos are uploaded to durable storage with signed URLs. No 30 MB base64 blobs in your response payloads.

output_url:
https://cdn.segmind.com/generations/gw_8f2a1c0e/video.mp4
Webhooks

Fire-and-forget jobs.

Pass a callback URL. Signed delivery on completion, automatic retry with backoff.

POSTyour-app.com/hooks200 OK
{
  "event": "job.succeeded",
  "id": "gw_8f2a1c0e",
  "output_url": ".../video.mp4"
}
Per-generation cost

Every generation, priced.

Itemized down to the second of video. See what each model actually costs you.

5s · 1080plast 24h
Seedance 2.0
$0.36
Veo 3.1
$1.20
Kling 3.0 Pro
$0.54
BYOK at 0%

Bring your own keys.

Plug in existing provider keys. We route, observe, and bill at zero markup.

Connected providers0% markup
OOpenAI

sk-•••a7e2

GGoogle

AIza•••82f

RRunway

run-•••3d1

EElevenLabs

el-•••9ce

Rate-limit smoothing

No more provider 429s.

The gateway queues, throttles, and re-routes around provider rate limits. Your app sees a steady-state API, not provider edge cases.

Steady 60 RPM, smoothed across 3 providers

Observability

Every generation, fully traced.

Input, provider, output, with cost, latency, and model version pinned. What LLM gateways pioneered, extended for media: prompt, seed, output URL, p95.

Trace · gw_8f2a1c0e8.4s · $1.20
input"cinematic shot, low golden light, 35mm…"
providerveo-3.1
outputvideo.mp4

How Routing Works

Provider outages
are someone else’s problem.

STEP 01

Primary

Your call hits the model you specified. In the happy path, that’s where it ends.

STEP 02

Fallback chain

If the primary is rate-limited, regionally degraded, or returning errors, we retry across your configured fallback chain (e.g., Veo → Kling → Runway).

STEP 03

BYOK or credits

Each hop uses your own provider key if you’ve plugged one in. Otherwise we use Segmind credits at the same list price.

Example config

{
  "model": "veo-3.1",
  "fallback": ["kling-3.0-pro", "runway-gen-4"],
  "byok": { "google": "gky_…", "runway": "rwk_…" }
}

Observability

See every generation.
Cost, latency, output, in one place.

Gateway Dashboard· last 30 days
Live
Generations

2.9M

+12.5% vs last month

Avg Latency

14.2s

all models

Fastest Model

0.8s

embedding · avg

By model

ModelGenerationsShareAvg Latency
nano-banana2,155,95174.1%11.8s
faceswap-v557,0382.0%8.3s
segmind-vega52,5531.8%2.2s
face-to-many37,3021.3%9.1s
seedance-1.5-pro36,9271.3%77.9s

Filter by model, route, fallback hop, customer-id. Export to CSV or stream to your warehouse.

Positioning

Built for media.
Not just text.

Generic LLM gateways are great for chat. The shape of a media workload, multi-minute generations, multi-gigabyte outputs, per-second billing, is a different problem.

CapabilitySegmind (media)Generic LLM gateway
Long-running generations (1–5 min)KEY
Async job, durable
Times out
Output delivery
Signed S3 URLs
Inline tokens
Per-second video pricingKEY
Native, itemized
Token-based only
Webhook callbacks
Job-level events
Streaming chunks
Image / video / audio in one APIKEY
Yes, native
Text-first
Token streaming for chat
Not applicable
First-class
Prompt caching for chat
Not applicable
First-class
Provider failover
Across labs and models
Across models
BYOK at 0% markup
Yes
Varies

If your workload is chat, an LLM gateway is the right choice. If you’re shipping image, video, or audio at scale, you’ll save a lot of glue code by starting here.

Who builds on Segmind

One gateway.
Every team shipping AI media.

From two-person startups to platforms serving millions. Same API, different scale.

STARTUPS

Ship the feature, not the plumbing.

One API, every frontier model across image, video, and audio. Skip the integration tax, the cold starts, the per-lab contracts. Go from idea to production in an afternoon, and let the gateway handle scale when you get there.

See the developer story

Models

500+

Time to first call

< 5 min

Billing

Pay as you go

Pricing

Pay exactly what providers charge. No platform fees.

Provider list price
No platform fees
No request fees
Pay as you go

FAQ

Frequently Asked Questions

Native SDKs for Node.js and Python, plus a clean HTTP API that works from cURL, Go, Rust, or anywhere else. The request shape is consistent across image, video, audio, and voice models.

Yes. Short generations return inline. Long generations (videos over a few seconds, batch jobs, anything past a request timeout) run as durable jobs you can poll or receive via webhook. Progress events are streamed where the underlying model supports it.

Plug your provider keys (OpenAI, Google, Runway, ElevenLabs, BFL, and others) into the gateway. We route requests using your keys at zero markup. You keep your provider relationship and any negotiated rates, and we add routing, failover, and observability on top.

The gateway runs in multiple regions (US, EU, APAC). You can pin requests to a region, prefer the lowest-latency region, or let the gateway pick based on provider availability.

Plan-based rate limits sit in front of the gateway. Provider-side rate limits are smoothed: when a provider returns 429, the gateway retries on your fallback chain rather than passing the error through.

By default, outputs are stored on Segmind-managed S3 with signed URLs and configurable retention. You can also ship outputs directly to your own bucket via a per-key delivery destination.

Per-request logs with the model used, fallback hops, provider response, cost, latency, and the signed output URL. Filter by model, customer-id, or hop. Export to CSV or stream to a warehouse.

Pro and Business plans include best-effort uptime. Scale and Enterprise plans include a contractual uptime SLA and a path to dedicated capacity for the highest-traffic accounts.

Available now on Segmind

One API key.
Every media model.

Start building in under five minutes. No contract, list-price billing, pay as you go.

Segmind is an AI infrastructure platform with 500+ media models from 40+ providers.
Built for image, video, audio, and voice workloads at production scale.