OLLMModels

Embedding & Reranking Models

Embedding and reranking models on OLLM for semantic search and RAG, available in TEE and ZDR environments and used through the AI SDK embeddingModel() method.

Embedding models turn text into numeric vectors that capture meaning. Reranking models score how relevant a set of documents is to a query. Together they power semantic search and retrieval-augmented generation (RAG).

When to Use

  • Embeddings: semantic search, clustering, deduplication, classification, and the retrieval step of a RAG pipeline
  • Reranking: reordering an initial set of retrieved documents so the most relevant ones rank highest, improving precision before passing context to a language model

Embedding models return vectors, not text. Use a language model to generate the final answer from retrieved context.

AI SDK Method

Embedding models are accessed with embeddingModel() and used with the AI SDK's embed and embedMany:

embedding-model.ts
import { createOLLM } from '@orgn/gateway';
import { embed, embedMany } from 'ai';

const ollm = createOLLM({ apiKey: process.env.OLLM_API_KEY });

// Single vector
const { embedding } = await embed({
  model: ollm.embeddingModel('near_qwen3_embedding_0_6b'),
  value: 'OLLM routes confidential LLM traffic.',
});

// Batch
const { embeddings } = await embedMany({
  model: ollm.embeddingModel('vercel_text_embedding_3_small'),
  values: [
    'Confidential computing protects data in use.',
    'TEEs use hardware-level encryption.',
  ],
});

Discover available embedding models with ollm.listModels({ outputModality: 'embedding' }).

Reranking models are reachable through the OLLM API but are not wired into the AI SDK provider's embeddingModel() interface. Call the gateway's rerank endpoint over raw HTTP to use them.

TEE Catalog

Embedding and reranking models running in Trusted Execution Environments, on NEAR and Phala infrastructure with Intel TDX + NVIDIA H100 confidential compute.

ModelProviderInfrastructureContext
Qwen3 Embedding 0.6BAlibabanear33K
Qwen3 Embedding 8BAlibabaphala33K
Qwen3 Reranker 0.6bAlibabanear41K

ZDR Catalog

Embedding and reranking models running on Vercel's AI infrastructure with zero data retention provider agreements.

Embedding Models

ModelProviderContext
OpenAI text-embedding-3-largeOpenAI8K
OpenAI text-embedding-3-smallOpenAI8K
text-embedding-ada-002OpenAI
Voyage 3 LargeVoyage32K
Voyage 3.5Voyage
Voyage 3.5 LiteVoyage
Voyage 4Voyage32K
Voyage 4 LargeVoyage32K
Voyage 4 LiteVoyage32K
Voyage Code 2Voyage
Voyage Code 3Voyage32K
Voyage Finance 2Voyage
Voyage Law 2Voyage
Mistral EmbedMistral8K
Codestral EmbedMistral
Cohere Embed v4.0Cohere128K
Gemini Embedding 001Google2K
Gemini Embedding 2Google
Text Embedding 005Google
Text Multilingual Embedding 002Google
Titan Text Embeddings V2Amazon
Qwen3 Embedding 0.6BAlibaba33K
Qwen3 Embedding 4BAlibaba33K
Qwen3 Embedding 8BAlibaba33K

Reranking Models

ModelProviderContext
Cohere Rerank 3.5Cohere4K
Cohere Rerank 4 FastCohere32K
Cohere Rerank 4 ProCohere32K
Voyage Rerank 2.5Voyage32K
Voyage Rerank 2.5 LiteVoyage32K

On this page