Models

OLLM provides access to two types of model execution environments, Trusted Execution Environments (TEE) for cryptographically verifiable private inference, and Zero Data Retention (ZDR) via Vercel for policy-based privacy with a broader model catalog.

OLLM exposes models through two distinct execution environments, each with different privacy guarantees: Trusted Execution Environments (TEE) and Zero Data Retention (ZDR). Both ensure your inference data is not stored or logged, but they differ significantly in how that guarantee is enforced and what evidence you receive.

Execution Types

Trusted Execution Environment (TEE)

TEE models run inside hardware-isolated secure enclaves. The CPU and GPU execute inference in an environment that is encrypted and isolated from the host operating system, the hypervisor, and any infrastructure personnel, including OLLM and the model provider.

Every inference request processed inside a TEE produces a cryptographic attestation receipt: hardware-signed evidence that proves which model ran, inside which verified environment, and that the execution was not tampered with. This is hardware-enforced privacy, not a contractual assurance, but a mathematical one you can independently verify.

TEE models on OLLM run on infrastructure provided by NEAR and Phala Network, both of which operate Intel TDX–based confidential virtual machines with NVIDIA H100 GPU attestation.

What TEE guarantees:

Prompts and responses are encrypted in memory during execution, invisible to the host OS, hypervisor, cloud provider, and OLLM
Hardware-signed attestation receipt per request, independently verifiable against Intel and NVIDIA public infrastructure
Cryptographic proof that the exact model you requested ran inside a genuine, unmodified TEE
Zero data retention: no prompts or outputs stored or logged

TEE infrastructure providers on OLLM:

Provider	Technology
NEAR (`near`)	Intel TDX + NVIDIA H100 confidential compute
Phala Network (`phala`)	Intel TDX + NVIDIA H100 confidential compute

Zero Data Retention (ZDR)

ZDR models run on Vercel's AI infrastructure and are governed by a contractual zero data retention commitment from the underlying model providers. Vercel's AI gateway enforces that inference providers do not store, log, or use your prompts and responses for any purpose, including model training.

ZDR does not use hardware-isolated execution environments. There is no attestation receipt and no cryptographic proof of execution. The privacy guarantee is enforced through provider agreements and Vercel's data handling policies, not through hardware isolation.

ZDR opens up a dramatically larger catalog: nearly every major frontier model from Anthropic, OpenAI, Google, Meta, Mistral, and dozens more, as well as image generation, video generation, and embedding models that are not available in TEE environments.

What ZDR guarantees:

Inference providers do not store or log your prompts or outputs
No training on your data
Policy-enforced zero retention by Vercel and the underlying model providers
Access to the broadest frontier model catalog

ZDR infrastructure provider on OLLM:

Provider	Technology
Vercel (`vercel`)	AI gateway with zero data retention provider agreements

Comparison

	TEE	ZDR
Privacy enforcement	Hardware-enforced, cryptographic	Policy-enforced, contractual
Attestation receipt	Yes, per request	No
Independent verification	Yes, against Intel and NVIDIA public PKI	No
Prompt visibility to OLLM	Never, hardware-enforced	Never, policy-enforced
Data retention	None	None
Model catalog	Focused set of open-weight models	Broad frontier model catalog
Image / video / embedding models	Limited	Extensive
Infrastructure providers	NEAR, Phala	Vercel
Best for	Regulated environments, auditability, sensitive data	General use, broadest model access

Choosing Between TEE and ZDR

Choose TEE when:

You operate in a regulated industry (healthcare, finance, legal) and need hardware-level data isolation
You need cryptographic proof of execution for audit or compliance purposes
Your threat model includes infrastructure-level compromise or insider risk at the provider
You require independently verifiable privacy guarantees per request

Choose ZDR when:

You need access to frontier closed-weight models (Claude, GPT-5, Gemini) not yet available in TEE environments
Your use case requires image generation, video generation, or advanced embedding and reranking models
Policy-enforced zero retention satisfies your compliance requirements
You want the broadest possible model catalog under a single API key

Both model types operate through the same OpenAI-compatible API and the same OLLM endpoint. The model ID you select determines which execution environment is used.

TEE Model Catalog

TEE models run on NEAR and Phala infrastructure with Intel TDX + NVIDIA H100 confidential compute. Every request produces a cryptographic attestation receipt.

Language Models

Model	Provider	Infrastructure	Context
DeepSeek V3.1	DeepSeek	near	128K
DeepSeek V3.1	DeepSeek	phala	164K
GLM 4.7	ZAI	near	205K
GLM 4.7	ZAI	phala	203K
GLM 4.7 Flash	ZAI	phala	203K
GLM 5	ZAI	near	203K
GLM 5.1	ZAI	near	203K
Kimi K2.5	Moonshot	phala	262K
GPT-OSS 120B	OpenAI	near	131K
GPT-OSS 120B	OpenAI	phala	131K
GPT-OSS 20B	OpenAI	phala	131K
Qwen3 30B	Alibaba	near	262K
Qwen3 30B	Alibaba	phala	262K
Qwen 2.5 7B	Alibaba	phala	32K
Qwen2.5 7B Instruct	Alibaba	phala	33K
Qwen3.5 122B	Alibaba	near	131K
Qwen3.5 27B	Alibaba	phala	262K
Venice Uncensored 24B	Venice	phala	33K
Gemma 3 27B	Google	phala	53K
Llama 3.3 70B	Meta	phala	131K

Vision Models

Model	Provider	Infrastructure	Context
Qwen3 VL 30B	Alibaba	near	256K
Qwen3 VL 30B	Alibaba	phala	262K
Qwen3 VL 30B A3B Instruct	Alibaba	phala	128K
Qwen2.5 VL 72B	Alibaba	phala	128K

Embedding & Reranking

Model	Provider	Infrastructure	Context
Qwen3 Embedding 0.6B	Alibaba	near	33K
Qwen3 Embedding 8B	Alibaba	phala	33K
Qwen3 Reranker 0.6b	Alibaba	near	41K

Audio

Model	Provider	Infrastructure	Context
Whisper Large V3	OpenAI	near	—

Image Generation

Model	Provider	Infrastructure
Flux.2 Klein 4B	BFL	near

Model	Context
Claude 3 Haiku	200K
Claude 3.5 Haiku	200K
Claude 3.7 Sonnet	200K
Claude Haiku 4.5	200K
Claude Sonnet 4	1M
Claude Sonnet 4.5	1M
Claude Sonnet 4.6	1M
Claude Opus 4	200K
Claude Opus 4.1	200K
Claude Opus 4.5	200K
Claude Opus 4.6	1M
Claude Opus 4.7	1M

OpenAI

Model	Context
GPT-4o	8K
GPT-4o mini	8K
GPT-4.1	8K
GPT-4.1 mini	8K
GPT-4.1 nano	1M
GPT-5	400K
GPT-5 mini	400K
GPT-5 nano	400K
GPT-5 Codex	400K
GPT-5.1 Instant	128K
GPT-5.1-Codex	400K
GPT 5 Chat	128K
GPT 5.1 Codex Max	400K
GPT 5.1 Codex Mini	400K
GPT 5.1 Thinking	400K
GPT 5.2	400K
GPT 5.2 Chat	128K
GPT 5.2 Codex	400K
GPT 5.3 Codex	400K
GPT 5.4	1.1M
GPT 5.4 Mini	400K
GPT 5.4 Nano	400K
GPT 5.4 Pro	1.1M
GPT-OSS 20B	131K
GPT-OSS 120B	131K
GPT OSS Safeguard 20B	131K
o1	200K
o3-mini	—
o4-mini	—

Google

Model	Context
Gemini 2.0 Flash	1M
Gemini 2.0 Flash-Lite	1M
Gemini 2.5 Flash-Lite	1M
Gemini 2.5 Flash	1M
Gemini 2.5 Pro	1M
Gemini 3 Flash	1M
Gemini 3 Pro Preview	1M
Gemini 3 Pro Image	66K
Gemini 3.1 Flash Lite Preview	1M
Gemini 3.1 Flash Image Preview	131K
Gemini 3.1 Pro Preview	1M
Gemma 4 26B A4B IT	262K
Gemma 4 31B IT	262K
Nano Banana (Gemini 2.5 Flash Image)	33K

Model	Context
Llama 3.1 8B	131K
Llama 3.1 70B	131K
Llama 3.2 1B	128K
Llama 3.2 3B	128K
Llama 3.2 11B Vision Instruct	128K
Llama 3.2 90B Vision Instruct	128K
Llama 3.3 70B	128K
Llama 4 Scout	131K
Llama 4 Maverick	524K

Mistral

Model	Context
Mistral Small	32K
Mistral Medium	128K
Mistral Large 3	256K
Mistral Nemo	131K
Ministral 3B	128K
Ministral 8B	128K
Ministral 14B	256K
Mixtral MoE 8x22B Instruct	66K
Magistral Small	128K
Magistral Medium	128K
Codestral	128K
Devstral 2	256K
Devstral Small	128K
Devstral Small 2	256K
Pixtral 12B	128K
Pixtral Large	128K

Alibaba (Qwen)

Model	Context
Qwen 3 14B	41K
Qwen 3 30B	41K
Qwen 3 32B	131K
Qwen 3 235B	131K
Qwen3 235B Thinking	262K
Qwen3 Coder	262K
Qwen3 Coder 30B	262K
Qwen3 Coder Next	256K
Qwen3 Next 80B	262K
Qwen3 VL Instruct	262K
Qwen 3.6 Plus	1M

DeepSeek

Model	Context
DeepSeek R1	164K
DeepSeek V3	164K
DeepSeek V3.1	164K
DeepSeek V3.2	164K

Moonshot

Model	Context
Kimi K2	131K
Kimi K2 Turbo	256K
Kimi K2 0905	256K
Kimi K2 Thinking	262K
Kimi K2 Thinking Turbo	262K
Kimi K2.5	262K

ZAI

Model	Context
GLM 4.6	205K
GLM 4.7	205K
GLM 4.7 Flash	200K
GLM 5	203K
GLM 5.1	203K

Other Language Models

Model	Provider	Context
MiniMax M2.1	MiniMax	205K
MiniMax M2.5	MiniMax	205K
Minimax M2.7	MiniMax	205K
Morph V3 Fast	Morph	82K
Morph V3 Large	Morph	82K
INTELLECT 3	PrimeIntellect	131K
Nemotron 3 Nano 30B	NVIDIA	262K
Nemotron Nano 12B v2 VL	NVIDIA	131K
Nemotron Nano 9B v2	NVIDIA	131K
NVIDIA Nemotron 3 Super 120B A12B	NVIDIA	256K
Nova 2 Lite	Amazon	1M
Nova Lite	Amazon	300K
Nova Micro	Amazon	128K
Nova Pro	Amazon	300K

Image Generation

Model	Provider
Flux Schnell	BFL
FLUX.1 Fill [pro]	BFL
FLUX.1 Kontext Max	BFL
FLUX.1 Kontext Pro	BFL
FLUX.2 [flex]	BFL
FLUX.2 [klein] 4B	BFL
FLUX.2 [klein] 9B	BFL
FLUX.2 [max]	BFL
FLUX.2 [pro]	BFL
FLUX1.1 [pro]	BFL
FLUX1.1 [pro] Ultra	BFL
GPT Image 1	OpenAI
GPT Image 1 Mini	OpenAI
GPT Image 1.5	OpenAI
GPT Image 2	OpenAI
Imagen 4	Google
Imagen 4 Fast	Google
Imagen 4 Ultra	Google
Grok Imagine	xAI
Grok Imagine Image	xAI
Grok Imagine Image Pro	xAI
Recraft V2	Recraft
Recraft V3	Recraft
Recraft V4	Recraft
Recraft V4 Pro	Recraft
Seedream 4.0	ByteDance
Seedream 4.5	ByteDance
Seedream 5.0 Lite	ByteDance

Video Generation

Model	Provider
Veo 3.0	Google
Veo 3.0 Fast Generate	Google
Veo 3.1	Google
Veo 3.1 Fast Generate	Google
Kling v2.5 Turbo Image-to-Video	Kuaishou
Kling v2.5 Turbo Text-to-Video	Kuaishou
Kling v2.6 Image-to-Video	Kuaishou
Kling v2.6 Motion Control	Kuaishou
Kling v2.6 Text-to-Video	Kuaishou
Kling v3.0 Image-to-Video	Kuaishou
Kling v3.0 Text-to-Video	Kuaishou
Seedance 2.0	ByteDance
Seedance 2.0 Fast	ByteDance
Seedance v1.0 Lite Image-to-Video	ByteDance
Seedance v1.0 Lite Text-to-Video	ByteDance
Seedance v1.0 Pro	ByteDance
Seedance v1.0 Pro Fast	ByteDance
Seedance v1.5 Pro	ByteDance
Wan v2.5 Text-to-Video Preview	Alibaba
Wan v2.6 Image-to-Video	Alibaba
Wan v2.6 Image-to-Video Flash	Alibaba
Wan v2.6 Reference-to-Video Flash	Alibaba
Wan v2.6 Text-to-Video	Alibaba

Embedding Models

Model	Provider	Context
OpenAI text-embedding-3-large	OpenAI	8K
OpenAI text-embedding-3-small	OpenAI	8K
text-embedding-ada-002	OpenAI	—
Voyage 3 Large	Voyage	32K
Voyage 3.5	Voyage	—
Voyage 3.5 Lite	Voyage	—
Voyage 4	Voyage	32K
Voyage 4 Large	Voyage	32K
Voyage 4 Lite	Voyage	32K
Voyage Code 2	Voyage	—
Voyage Code 3	Voyage	32K
Voyage Finance 2	Voyage	—
Voyage Law 2	Voyage	—
Mistral Embed	Mistral	8K
Codestral Embed	Mistral	—
Cohere Embed v4.0	Cohere	128K
Gemini Embedding 001	Google	2K
Gemini Embedding 2	Google	—
Text Embedding 005	Google	—
Text Multilingual Embedding 002	Google	—
Titan Text Embeddings V2	Amazon	—
Qwen3 Embedding 0.6B	Alibaba	33K
Qwen3 Embedding 4B	Alibaba	33K
Qwen3 Embedding 8B	Alibaba	33K

Reranking Models

Model	Provider	Context
Cohere Rerank 3.5	Cohere	4K
Cohere Rerank 4 Fast	Cohere	32K
Cohere Rerank 4 Pro	Cohere	32K
Voyage Rerank 2.5	Voyage	32K
Voyage Rerank 2.5 Lite	Voyage	32K

Models

Execution Types

Trusted Execution Environment (TEE)

Zero Data Retention (ZDR)

Comparison

Choosing Between TEE and ZDR

TEE Model Catalog

Language Models

Vision Models

Embedding & Reranking

Audio

Image Generation

ZDR Model Catalog

Language Models

Anthropic

OpenAI

Google

Meta

Mistral

Alibaba (Qwen)

DeepSeek

Moonshot

ZAI

Other Language Models

Image Generation

Video Generation

Embedding Models

Reranking Models

On this page