OLLM

Models

OLLM provides access to two types of model execution environments, Trusted Execution Environments (TEE) for cryptographically verifiable private inference, and Zero Data Retention (ZDR) via Vercel for policy-based privacy with a broader model catalog.

OLLM exposes models through two distinct execution environments, each with different privacy guarantees: Trusted Execution Environments (TEE) and Zero Data Retention (ZDR). Both ensure your inference data is not stored or logged, but they differ significantly in how that guarantee is enforced and what evidence you receive.

Execution Types

Trusted Execution Environment (TEE)

TEE models run inside hardware-isolated secure enclaves. The CPU and GPU execute inference in an environment that is encrypted and isolated from the host operating system, the hypervisor, and any infrastructure personnel, including OLLM and the model provider.

Every inference request processed inside a TEE produces a cryptographic attestation receipt: hardware-signed evidence that proves which model ran, inside which verified environment, and that the execution was not tampered with. This is hardware-enforced privacy, not a contractual assurance, but a mathematical one you can independently verify.

TEE models on OLLM run on infrastructure provided by NEAR and Phala Network, both of which operate Intel TDX–based confidential virtual machines with NVIDIA H100 GPU attestation.

What TEE guarantees:

  • Prompts and responses are encrypted in memory during execution, invisible to the host OS, hypervisor, cloud provider, and OLLM
  • Hardware-signed attestation receipt per request, independently verifiable against Intel and NVIDIA public infrastructure
  • Cryptographic proof that the exact model you requested ran inside a genuine, unmodified TEE
  • Zero data retention: no prompts or outputs stored or logged

TEE infrastructure providers on OLLM:

ProviderTechnology
NEAR (near)Intel TDX + NVIDIA H100 confidential compute
Phala Network (phala)Intel TDX + NVIDIA H100 confidential compute

Zero Data Retention (ZDR)

ZDR models run on Vercel's AI infrastructure and are governed by a contractual zero data retention commitment from the underlying model providers. Vercel's AI gateway enforces that inference providers do not store, log, or use your prompts and responses for any purpose, including model training.

ZDR does not use hardware-isolated execution environments. There is no attestation receipt and no cryptographic proof of execution. The privacy guarantee is enforced through provider agreements and Vercel's data handling policies, not through hardware isolation.

ZDR opens up a dramatically larger catalog: nearly every major frontier model from Anthropic, OpenAI, Google, Meta, Mistral, and dozens more, as well as image generation, video generation, and embedding models that are not available in TEE environments.

What ZDR guarantees:

  • Inference providers do not store or log your prompts or outputs
  • No training on your data
  • Policy-enforced zero retention by Vercel and the underlying model providers
  • Access to the broadest frontier model catalog

ZDR infrastructure provider on OLLM:

ProviderTechnology
Vercel (vercel)AI gateway with zero data retention provider agreements

Comparison

TEEZDR
Privacy enforcementHardware-enforced, cryptographicPolicy-enforced, contractual
Attestation receiptYes, per requestNo
Independent verificationYes, against Intel and NVIDIA public PKINo
Prompt visibility to OLLMNever, hardware-enforcedNever, policy-enforced
Data retentionNoneNone
Model catalogFocused set of open-weight modelsBroad frontier model catalog
Image / video / embedding modelsLimitedExtensive
Infrastructure providersNEAR, PhalaVercel
Best forRegulated environments, auditability, sensitive dataGeneral use, broadest model access

Choosing Between TEE and ZDR

Choose TEE when:

  • You operate in a regulated industry (healthcare, finance, legal) and need hardware-level data isolation
  • You need cryptographic proof of execution for audit or compliance purposes
  • Your threat model includes infrastructure-level compromise or insider risk at the provider
  • You require independently verifiable privacy guarantees per request

Choose ZDR when:

  • You need access to frontier closed-weight models (Claude, GPT-5, Gemini) not yet available in TEE environments
  • Your use case requires image generation, video generation, or advanced embedding and reranking models
  • Policy-enforced zero retention satisfies your compliance requirements
  • You want the broadest possible model catalog under a single API key

Both model types operate through the same OpenAI-compatible API and the same OLLM endpoint. The model ID you select determines which execution environment is used.


TEE Model Catalog

TEE models run on NEAR and Phala infrastructure with Intel TDX + NVIDIA H100 confidential compute. Every request produces a cryptographic attestation receipt.

Language Models

ModelProviderInfrastructureContext
DeepSeek V3.1DeepSeeknear128K
DeepSeek V3.1DeepSeekphala164K
GLM 4.7ZAInear205K
GLM 4.7ZAIphala203K
GLM 4.7 FlashZAIphala203K
GLM 5ZAInear203K
GLM 5.1ZAInear203K
Kimi K2.5Moonshotphala262K
GPT-OSS 120BOpenAInear131K
GPT-OSS 120BOpenAIphala131K
GPT-OSS 20BOpenAIphala131K
Qwen3 30BAlibabanear262K
Qwen3 30BAlibabaphala262K
Qwen 2.5 7BAlibabaphala32K
Qwen2.5 7B InstructAlibabaphala33K
Qwen3.5 122BAlibabanear131K
Qwen3.5 27BAlibabaphala262K
Venice Uncensored 24BVenicephala33K
Gemma 3 27BGooglephala53K
Llama 3.3 70BMetaphala131K

Vision Models

ModelProviderInfrastructureContext
Qwen3 VL 30BAlibabanear256K
Qwen3 VL 30BAlibabaphala262K
Qwen3 VL 30B A3B InstructAlibabaphala128K
Qwen2.5 VL 72BAlibabaphala128K

Embedding & Reranking

ModelProviderInfrastructureContext
Qwen3 Embedding 0.6BAlibabanear33K
Qwen3 Embedding 8BAlibabaphala33K
Qwen3 Reranker 0.6bAlibabanear41K

Audio

ModelProviderInfrastructureContext
Whisper Large V3OpenAInear

Image Generation

ModelProviderInfrastructure
Flux.2 Klein 4BBFLnear

ZDR Model Catalog

ZDR models run on Vercel's AI infrastructure with zero data retention provider agreements. No attestation receipts are generated.

Language Models

Anthropic

ModelContext
Claude 3 Haiku200K
Claude 3.5 Haiku200K
Claude 3.7 Sonnet200K
Claude Haiku 4.5200K
Claude Sonnet 41M
Claude Sonnet 4.51M
Claude Sonnet 4.61M
Claude Opus 4200K
Claude Opus 4.1200K
Claude Opus 4.5200K
Claude Opus 4.61M
Claude Opus 4.71M

OpenAI

ModelContext
GPT-4o8K
GPT-4o mini8K
GPT-4.18K
GPT-4.1 mini8K
GPT-4.1 nano1M
GPT-5400K
GPT-5 mini400K
GPT-5 nano400K
GPT-5 Codex400K
GPT-5.1 Instant128K
GPT-5.1-Codex400K
GPT 5 Chat128K
GPT 5.1 Codex Max400K
GPT 5.1 Codex Mini400K
GPT 5.1 Thinking400K
GPT 5.2400K
GPT 5.2 Chat128K
GPT 5.2 Codex400K
GPT 5.3 Codex400K
GPT 5.41.1M
GPT 5.4 Mini400K
GPT 5.4 Nano400K
GPT 5.4 Pro1.1M
GPT-OSS 20B131K
GPT-OSS 120B131K
GPT OSS Safeguard 20B131K
o1200K
o3-mini
o4-mini

Google

ModelContext
Gemini 2.0 Flash1M
Gemini 2.0 Flash-Lite1M
Gemini 2.5 Flash-Lite1M
Gemini 2.5 Flash1M
Gemini 2.5 Pro1M
Gemini 3 Flash1M
Gemini 3 Pro Preview1M
Gemini 3 Pro Image66K
Gemini 3.1 Flash Lite Preview1M
Gemini 3.1 Flash Image Preview131K
Gemini 3.1 Pro Preview1M
Gemma 4 26B A4B IT262K
Gemma 4 31B IT262K
Nano Banana (Gemini 2.5 Flash Image)33K

Meta

ModelContext
Llama 3.1 8B131K
Llama 3.1 70B131K
Llama 3.2 1B128K
Llama 3.2 3B128K
Llama 3.2 11B Vision Instruct128K
Llama 3.2 90B Vision Instruct128K
Llama 3.3 70B128K
Llama 4 Scout131K
Llama 4 Maverick524K

Mistral

ModelContext
Mistral Small32K
Mistral Medium128K
Mistral Large 3256K
Mistral Nemo131K
Ministral 3B128K
Ministral 8B128K
Ministral 14B256K
Mixtral MoE 8x22B Instruct66K
Magistral Small128K
Magistral Medium128K
Codestral128K
Devstral 2256K
Devstral Small128K
Devstral Small 2256K
Pixtral 12B128K
Pixtral Large128K

Alibaba (Qwen)

ModelContext
Qwen 3 14B41K
Qwen 3 30B41K
Qwen 3 32B131K
Qwen 3 235B131K
Qwen3 235B Thinking262K
Qwen3 Coder262K
Qwen3 Coder 30B262K
Qwen3 Coder Next256K
Qwen3 Next 80B262K
Qwen3 VL Instruct262K
Qwen 3.6 Plus1M

DeepSeek

ModelContext
DeepSeek R1164K
DeepSeek V3164K
DeepSeek V3.1164K
DeepSeek V3.2164K

Moonshot

ModelContext
Kimi K2131K
Kimi K2 Turbo256K
Kimi K2 0905256K
Kimi K2 Thinking262K
Kimi K2 Thinking Turbo262K
Kimi K2.5262K

ZAI

ModelContext
GLM 4.6205K
GLM 4.7205K
GLM 4.7 Flash200K
GLM 5203K
GLM 5.1203K

Other Language Models

ModelProviderContext
MiniMax M2.1MiniMax205K
MiniMax M2.5MiniMax205K
Minimax M2.7MiniMax205K
Morph V3 FastMorph82K
Morph V3 LargeMorph82K
INTELLECT 3PrimeIntellect131K
Nemotron 3 Nano 30BNVIDIA262K
Nemotron Nano 12B v2 VLNVIDIA131K
Nemotron Nano 9B v2NVIDIA131K
NVIDIA Nemotron 3 Super 120B A12BNVIDIA256K
Nova 2 LiteAmazon1M
Nova LiteAmazon300K
Nova MicroAmazon128K
Nova ProAmazon300K

Image Generation

ModelProvider
Flux SchnellBFL
FLUX.1 Fill [pro]BFL
FLUX.1 Kontext MaxBFL
FLUX.1 Kontext ProBFL
FLUX.2 [flex]BFL
FLUX.2 [klein] 4BBFL
FLUX.2 [klein] 9BBFL
FLUX.2 [max]BFL
FLUX.2 [pro]BFL
FLUX1.1 [pro]BFL
FLUX1.1 [pro] UltraBFL
GPT Image 1OpenAI
GPT Image 1 MiniOpenAI
GPT Image 1.5OpenAI
GPT Image 2OpenAI
Imagen 4Google
Imagen 4 FastGoogle
Imagen 4 UltraGoogle
Grok ImaginexAI
Grok Imagine ImagexAI
Grok Imagine Image ProxAI
Recraft V2Recraft
Recraft V3Recraft
Recraft V4Recraft
Recraft V4 ProRecraft
Seedream 4.0ByteDance
Seedream 4.5ByteDance
Seedream 5.0 LiteByteDance

Video Generation

ModelProvider
Veo 3.0Google
Veo 3.0 Fast GenerateGoogle
Veo 3.1Google
Veo 3.1 Fast GenerateGoogle
Kling v2.5 Turbo Image-to-VideoKuaishou
Kling v2.5 Turbo Text-to-VideoKuaishou
Kling v2.6 Image-to-VideoKuaishou
Kling v2.6 Motion ControlKuaishou
Kling v2.6 Text-to-VideoKuaishou
Kling v3.0 Image-to-VideoKuaishou
Kling v3.0 Text-to-VideoKuaishou
Seedance 2.0ByteDance
Seedance 2.0 FastByteDance
Seedance v1.0 Lite Image-to-VideoByteDance
Seedance v1.0 Lite Text-to-VideoByteDance
Seedance v1.0 ProByteDance
Seedance v1.0 Pro FastByteDance
Seedance v1.5 ProByteDance
Wan v2.5 Text-to-Video PreviewAlibaba
Wan v2.6 Image-to-VideoAlibaba
Wan v2.6 Image-to-Video FlashAlibaba
Wan v2.6 Reference-to-Video FlashAlibaba
Wan v2.6 Text-to-VideoAlibaba

Embedding Models

ModelProviderContext
OpenAI text-embedding-3-largeOpenAI8K
OpenAI text-embedding-3-smallOpenAI8K
text-embedding-ada-002OpenAI
Voyage 3 LargeVoyage32K
Voyage 3.5Voyage
Voyage 3.5 LiteVoyage
Voyage 4Voyage32K
Voyage 4 LargeVoyage32K
Voyage 4 LiteVoyage32K
Voyage Code 2Voyage
Voyage Code 3Voyage32K
Voyage Finance 2Voyage
Voyage Law 2Voyage
Mistral EmbedMistral8K
Codestral EmbedMistral
Cohere Embed v4.0Cohere128K
Gemini Embedding 001Google2K
Gemini Embedding 2Google
Text Embedding 005Google
Text Multilingual Embedding 002Google
Titan Text Embeddings V2Amazon
Qwen3 Embedding 0.6BAlibaba33K
Qwen3 Embedding 4BAlibaba33K
Qwen3 Embedding 8BAlibaba33K

Reranking Models

ModelProviderContext
Cohere Rerank 3.5Cohere4K
Cohere Rerank 4 FastCohere32K
Cohere Rerank 4 ProCohere32K
Voyage Rerank 2.5Voyage32K
Voyage Rerank 2.5 LiteVoyage32K

On this page