Embedding & Reranking Models
Embedding and reranking models on OLLM for semantic search and RAG, available in TEE and ZDR environments and used through the AI SDK embeddingModel() method.
Embedding models turn text into numeric vectors that capture meaning. Reranking models score how relevant a set of documents is to a query. Together they power semantic search and retrieval-augmented generation (RAG).
When to Use
- Embeddings: semantic search, clustering, deduplication, classification, and the retrieval step of a RAG pipeline
- Reranking: reordering an initial set of retrieved documents so the most relevant ones rank highest, improving precision before passing context to a language model
Embedding models return vectors, not text. Use a language model to generate the final answer from retrieved context.
AI SDK Method
Embedding models are accessed with embeddingModel() and used with the AI SDK's embed and embedMany:
import { createOLLM } from '@orgn/gateway';
import { embed, embedMany } from 'ai';
const ollm = createOLLM({ apiKey: process.env.OLLM_API_KEY });
// Single vector
const { embedding } = await embed({
model: ollm.embeddingModel('near_qwen3_embedding_0_6b'),
value: 'OLLM routes confidential LLM traffic.',
});
// Batch
const { embeddings } = await embedMany({
model: ollm.embeddingModel('vercel_text_embedding_3_small'),
values: [
'Confidential computing protects data in use.',
'TEEs use hardware-level encryption.',
],
});Discover available embedding models with ollm.listModels({ outputModality: 'embedding' }).
Reranking models are reachable through the OLLM API but are not wired into the AI SDK provider's embeddingModel() interface. Call the gateway's rerank endpoint over raw HTTP to use them.
TEE Catalog
Embedding and reranking models running in Trusted Execution Environments, on NEAR and Phala infrastructure with Intel TDX + NVIDIA H100 confidential compute.
| Model | Provider | Infrastructure | Context |
|---|---|---|---|
| Qwen3 Embedding 0.6B | Alibaba | near | 33K |
| Qwen3 Embedding 8B | Alibaba | phala | 33K |
| Qwen3 Reranker 0.6b | Alibaba | near | 41K |
ZDR Catalog
Embedding and reranking models running on Vercel's AI infrastructure with zero data retention provider agreements.
Embedding Models
| Model | Provider | Context |
|---|---|---|
| OpenAI text-embedding-3-large | OpenAI | 8K |
| OpenAI text-embedding-3-small | OpenAI | 8K |
| text-embedding-ada-002 | OpenAI | — |
| Voyage 3 Large | Voyage | 32K |
| Voyage 3.5 | Voyage | — |
| Voyage 3.5 Lite | Voyage | — |
| Voyage 4 | Voyage | 32K |
| Voyage 4 Large | Voyage | 32K |
| Voyage 4 Lite | Voyage | 32K |
| Voyage Code 2 | Voyage | — |
| Voyage Code 3 | Voyage | 32K |
| Voyage Finance 2 | Voyage | — |
| Voyage Law 2 | Voyage | — |
| Mistral Embed | Mistral | 8K |
| Codestral Embed | Mistral | — |
| Cohere Embed v4.0 | Cohere | 128K |
| Gemini Embedding 001 | 2K | |
| Gemini Embedding 2 | — | |
| Text Embedding 005 | — | |
| Text Multilingual Embedding 002 | — | |
| Titan Text Embeddings V2 | Amazon | — |
| Qwen3 Embedding 0.6B | Alibaba | 33K |
| Qwen3 Embedding 4B | Alibaba | 33K |
| Qwen3 Embedding 8B | Alibaba | 33K |
Reranking Models
| Model | Provider | Context |
|---|---|---|
| Cohere Rerank 3.5 | Cohere | 4K |
| Cohere Rerank 4 Fast | Cohere | 32K |
| Cohere Rerank 4 Pro | Cohere | 32K |
| Voyage Rerank 2.5 | Voyage | 32K |
| Voyage Rerank 2.5 Lite | Voyage | 32K |
Vision Models
Image-understanding models on OLLM that accept image input alongside text, available in both TEE and ZDR environments and used through the AI SDK chatModel() method.
Audio Models
Speech-to-text transcription models on OLLM, running in Trusted Execution Environments and used through the AI SDK transcriptionModel() method.