The AI Gateway
for Media Models.
One API for image, video, and audio across every major lab.Automatic failover, BYOK, list-price billing. No markup.
curl -X POST "https://api.segmind.com/v1/nano-banana-pro" \
-H "Authorization: Bearer $SEGMIND_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "an editorial portrait, golden hour"
}'Segmind
Gateway
The Catalog
500+ media models.
One endpoint.
Every major lab. Every modality. Image, video, audio, voice. Drop in a new model by changing one string.
Why a Gateway
Three things you stop building.
The day you switch.
One key, every media model
Stop wrangling 40+ providers.
One API key, one billing relationship, one set of credentials. Image, video, audio, voice. Every major lab behind a single endpoint with consistent request and response shapes.
Automatic failover
When a provider goes dark, your app doesn’t.
Configure a fallback chain across providers. If a primary model is rate-limited, regionally degraded, or returning errors, the gateway retries on the next model, without you touching code.
No platform fees
Pay exactly what providers charge.
No gateway tax, no per-request surcharge, no inflated token rates. Per-generation cost is itemized in the dashboard, down to the second of video.
Media-Native Infrastructure
The things LLM gateways
weren’t built to do.
Long generations are first-class.
Five-minute video generations don’t fit the LLM token-streaming model. The gateway treats every job as durable: poll, webhook, or stream progress updates. No request timeouts.
Outputs land in object storage.
Generated images and videos are uploaded to durable storage with signed URLs. No 30 MB base64 blobs in your response payloads.
https://cdn.segmind.com/
Fire-and-forget jobs.
Pass a callback URL. Signed delivery on completion, automatic retry with backoff.
{ "event": "job.succeeded", "id": "gw_8f2a1c0e", "output_url": ".../video.mp4" }
Every generation, priced.
Itemized down to the second of video. See what each model actually costs you.
Bring your own keys.
Plug in existing provider keys. We route, observe, and bill at zero markup.
sk-•••a7e2
AIza•••82f
run-•••3d1
el-•••9ce
No more provider 429s.
The gateway queues, throttles, and re-routes around provider rate limits. Your app sees a steady-state API, not provider edge cases.
Steady 60 RPM, smoothed across 3 providers
Every generation, fully traced.
Input, provider, output, with cost, latency, and model version pinned. What LLM gateways pioneered, extended for media: prompt, seed, output URL, p95.
How Routing Works
Provider outages
are someone else’s problem.
Primary
Your call hits the model you specified. In the happy path, that’s where it ends.
Fallback chain
If the primary is rate-limited, regionally degraded, or returning errors, we retry across your configured fallback chain (e.g., Veo → Kling → Runway).
BYOK or credits
Each hop uses your own provider key if you’ve plugged one in. Otherwise we use Segmind credits at the same list price.
Example config
{
"model": "veo-3.1",
"fallback": ["kling-3.0-pro", "runway-gen-4"],
"byok": { "google": "gky_…", "runway": "rwk_…" }
}Observability
See every generation.
Cost, latency, output, in one place.
2.9M
+12.5% vs last month
14.2s
all models
0.8s
embedding · avg
By model
Filter by model, route, fallback hop, customer-id. Export to CSV or stream to your warehouse.
Positioning
Built for media.
Not just text.
Generic LLM gateways are great for chat. The shape of a media workload, multi-minute generations, multi-gigabyte outputs, per-second billing, is a different problem.
If your workload is chat, an LLM gateway is the right choice. If you’re shipping image, video, or audio at scale, you’ll save a lot of glue code by starting here.
Who builds on Segmind
One gateway.
Every team shipping AI media.
From two-person startups to platforms serving millions. Same API, different scale.
Ship the feature, not the plumbing.
One API, every frontier model across image, video, and audio. Skip the integration tax, the cold starts, the per-lab contracts. Go from idea to production in an afternoon, and let the gateway handle scale when you get there.
See the developer storyModels
500+
Time to first call
< 5 min
Billing
Pay as you go
Pricing
Pay exactly what providers charge. No platform fees.
FAQ
Frequently Asked Questions
Native SDKs for Node.js and Python, plus a clean HTTP API that works from cURL, Go, Rust, or anywhere else. The request shape is consistent across image, video, audio, and voice models.
Yes. Short generations return inline. Long generations (videos over a few seconds, batch jobs, anything past a request timeout) run as durable jobs you can poll or receive via webhook. Progress events are streamed where the underlying model supports it.
Plug your provider keys (OpenAI, Google, Runway, ElevenLabs, BFL, and others) into the gateway. We route requests using your keys at zero markup. You keep your provider relationship and any negotiated rates, and we add routing, failover, and observability on top.
The gateway runs in multiple regions (US, EU, APAC). You can pin requests to a region, prefer the lowest-latency region, or let the gateway pick based on provider availability.
Plan-based rate limits sit in front of the gateway. Provider-side rate limits are smoothed: when a provider returns 429, the gateway retries on your fallback chain rather than passing the error through.
By default, outputs are stored on Segmind-managed S3 with signed URLs and configurable retention. You can also ship outputs directly to your own bucket via a per-key delivery destination.
Per-request logs with the model used, fallback hops, provider response, cost, latency, and the signed output URL. Filter by model, customer-id, or hop. Export to CSV or stream to a warehouse.
Pro and Business plans include best-effort uptime. Scale and Enterprise plans include a contractual uptime SLA and a path to dedicated capacity for the highest-traffic accounts.
One API key.
Every media model.
Start building in under five minutes. No contract, list-price billing, pay as you go.
Segmind is an AI infrastructure platform with 500+ media models from 40+ providers.
Built for image, video, audio, and voice workloads at production scale.











