20

models

Google logo

Google Models

Google offers the broadest multimodal AI portfolio — language, image, and video from one provider. Gemini 2.5 Pro and Flash deliver frontier reasoning with massive context windows. Imagen 4 produces photorealistic images with accurate text rendering. The Veo family (Veo 2, 3, 3.1) generates cinematic video with realistic motion and natural audio. Access Gemini, Imagen, and Veo via Segmind APIs — no Google Cloud credentials needed. Chain them in Segmind Workflows for end-to-end content pipelines from strategy to video, fully automated.

google
Image To Video

Veo 3.1 Lite

49.5s
Image To Image

Nano Banana 2

39.3s
Text To Audio

Gemini TTS 2.5 Flash

17.6s
Text To Audio

Gemini TTS 2.5 Pro

32.6s
LLM

Gemini 3 Pro

31.4s
Image To Image

Nano Banana Pro

61.1s
Image To Video

Veo 3.1 Fast

99.3s
Image To Video

Veo 3.1

110.5s
LLM

Gemini 2.5 Flash

11.1s
LLM

Gemini 2.5 PRO

27.4s
Text To Image

Nano Banana

14.3s
Text To Video

Veo 3 Fast

80.4s
Text To Video

Google Veo 3

144.4s
Text To Audio

Lyria 2

27.2s
Text To Image

Imagen 4

11.4s
Image To Text

Google Translate

0.7s
Image To Video

Google Veo 2 Image To Video

40.6s
LLM

Gemini 2 Flash

8.6s
Text To Video

Google Veo 2

39.9s
Text To Image

Imagen 3

8.2s