Gemini Embedding 2 — Text Embedding API

What is Gemini Embedding 2?

Gemini Embedding 2 is Google's state-of-the-art text embedding model, built to convert natural language into dense numerical vectors that capture semantic meaning. With the top MTEB score of 68.16 — outperforming OpenAI text-embedding-3-large (64.6) and Cohere embed-v4 (65.2) — it delivers industry-leading retrieval and similarity accuracy across 100+ languages.

Via the Segmind API, you send a text string and receive a float vector ready for indexing in any vector database (Pinecone, Qdrant, Weaviate, pgvector). Eight task-specific modes let you tune the embedding direction to exactly match your use case — from document retrieval to code search to fact verification.

Key Features

•SOTA MTEB performance: 68.16 overall; 84.0 on code retrieval; 69.9 on multilingual tasks
•8 task types: RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION, CODE_RETRIEVAL_QUERY
•Configurable dimensionality: Output vectors from 256 to 768 dimensions — balance quality vs. storage cost
•100+ language support: Strong cross-lingual retrieval without translation preprocessing
•Synchronous API: Immediate JSON response with no polling required
•Segmind infrastructure: Low-latency, scalable endpoint — no GCP project or quota setup needed

Best Use Cases

Retrieval-Augmented Generation (RAG): Use RETRIEVAL_DOCUMENT to embed your knowledge base and RETRIEVAL_QUERY for user questions. The asymmetric task pairing significantly improves precision over single-task embeddings.

Semantic search: Embed a product catalog, documentation site, or support knowledge base. Users get semantically relevant results even when they phrase queries differently from how the content was written.

Classification and clustering: Feed embeddings into a lightweight classifier or k-means cluster to categorize support tickets, content tags, or customer feedback without fine-tuning a full LLM.

Code search: CODE_RETRIEVAL_QUERY produces embeddings tuned for function signatures, docstrings, and code snippets — ideal for developer tools and IDE assistants.

Multilingual pipelines: With 69.9 MTEB multilingual score, a single index handles queries and documents in different languages without separate per-language models.

Prompt Tips and Output Quality

•Always pair task types correctly: Index documents with RETRIEVAL_DOCUMENT and query with RETRIEVAL_QUERY. Mixing types degrades recall.
•Chunk long documents: Split text into 200–500 token segments before embedding. Each chunk should represent one coherent idea.
•Dimensionality tradeoff: 768 (default) gives optimal quality. Drop to 512 or 256 if your vector database costs or query latency are a concern — vectors are truncated, not re-encoded.
•SEMANTIC_SIMILARITY is not for retrieval: Use it only when comparing two texts directly (e.g., paraphrase detection, duplicate finding). It underperforms RETRIEVAL_* pairs in RAG pipelines.
•Batch efficiently: Embed multiple documents in sequence; the model is synchronous and fast enough for real-time pipelines at moderate scale.

FAQs

Q: What is the difference between RETRIEVAL_QUERY and RETRIEVAL_DOCUMENT? RETRIEVAL_QUERY embeds a user's question or search query; RETRIEVAL_DOCUMENT embeds the passages or documents in your index. Always use them as a matched pair — this asymmetric approach is how the model is optimised and produces the best recall.

Q: Can I mix task types when comparing vectors? No. Cosine similarity is only meaningful between vectors produced with the same task type.

Q: How does output_dimensionality work? The model produces a full-length vector and then truncates it to your specified size. A value of 768 is the recommended default. Smaller values (256, 512) reduce storage and query latency but may slightly lower retrieval accuracy.

Q: Is Gemini Embedding 2 better than OpenAI text-embedding-3-large? On the MTEB leaderboard, Gemini Embedding 2 scores 68.16 vs. 64.6 for text-embedding-3-large. The quality gap is meaningful for multilingual workloads and code retrieval.

Q: What vector databases work with Gemini Embedding 2 embeddings? Any database that accepts float arrays — Pinecone, Qdrant, Weaviate, Chroma, pgvector, Redis, Milvus. Set index dimensions to match your output_dimensionality setting (default: 768).

Q: Does the model support batch input? The Segmind API accepts a single string per request. For batch workloads, send concurrent requests or loop through your corpus sequentially.

Gemini Embedding 2