Language Models
Text generation and chat models available through OLLM, covering both TEE and ZDR execution environments and their use with the AI SDK chatModel() method.
Language models handle text generation and chat: single-prompt completions, multi-turn conversations, reasoning, and tool use. They are the most common model type and back most OLLM applications.
When to Use
Use a language model when you need to generate or transform text:
- Chat assistants and multi-turn conversations
- Single-prompt generation, summarization, and rewriting
- Reasoning tasks (use a reasoning-capable model with
reasoningEffort) - Code generation and analysis
- Tool calling and structured output
For image input, see Vision. For turning speech into text, see Audio.
AI SDK Method
Language models are accessed with chatModel() and used with the AI SDK's generateText and streamText:
import { createOLLM } from '@orgn/gateway';
import { generateText } from 'ai';
const ollm = createOLLM({ apiKey: process.env.OLLM_API_KEY });
const { text } = await generateText({
model: ollm.chatModel('near_glm_5_1'),
prompt: 'What is OLLM?',
});For streaming, system messages, multi-turn conversations, and reasoning options, see the Vercel AI SDK integration.
The legacy /v1/completions endpoint is not supported. Every completion task can be expressed as a chat call with chatModel().
TEE Catalog
Language models running in Trusted Execution Environments, on NEAR and Phala infrastructure with Intel TDX + NVIDIA H100 confidential compute. Every request produces a cryptographic attestation receipt.
| Model | Provider | Infrastructure | Context |
|---|---|---|---|
| DeepSeek V3.1 | DeepSeek | near | 128K |
| DeepSeek V3.1 | DeepSeek | phala | 164K |
| GLM 4.7 | ZAI | near | 205K |
| GLM 4.7 | ZAI | phala | 203K |
| GLM 4.7 Flash | ZAI | phala | 203K |
| GLM 5 | ZAI | near | 203K |
| GLM 5.1 | ZAI | near | 203K |
| Kimi K2.5 | Moonshot | phala | 262K |
| GPT-OSS 120B | OpenAI | near | 131K |
| GPT-OSS 120B | OpenAI | phala | 131K |
| GPT-OSS 20B | OpenAI | phala | 131K |
| Qwen3 30B | Alibaba | near | 262K |
| Qwen3 30B | Alibaba | phala | 262K |
| Qwen 2.5 7B | Alibaba | phala | 32K |
| Qwen2.5 7B Instruct | Alibaba | phala | 33K |
| Qwen3.5 122B | Alibaba | near | 131K |
| Qwen3.5 27B | Alibaba | phala | 262K |
| Venice Uncensored 24B | Venice | phala | 33K |
| Gemma 3 27B | phala | 53K | |
| Llama 3.3 70B | Meta | phala | 131K |
ZDR Catalog
Language models running on Vercel's AI infrastructure with zero data retention provider agreements. No attestation receipts are generated.
Anthropic
| Model | Context |
|---|---|
| Claude 3 Haiku | 200K |
| Claude 3.5 Haiku | 200K |
| Claude 3.7 Sonnet | 200K |
| Claude Haiku 4.5 | 200K |
| Claude Sonnet 4 | 1M |
| Claude Sonnet 4.5 | 1M |
| Claude Sonnet 4.6 | 1M |
| Claude Opus 4 | 200K |
| Claude Opus 4.1 | 200K |
| Claude Opus 4.5 | 200K |
| Claude Opus 4.6 | 1M |
| Claude Opus 4.7 | 1M |
OpenAI
| Model | Context |
|---|---|
| GPT-4o | 8K |
| GPT-4o mini | 8K |
| GPT-4.1 | 8K |
| GPT-4.1 mini | 8K |
| GPT-4.1 nano | 1M |
| GPT-5 | 400K |
| GPT-5 mini | 400K |
| GPT-5 nano | 400K |
| GPT-5 Codex | 400K |
| GPT-5.1 Instant | 128K |
| GPT-5.1-Codex | 400K |
| GPT 5 Chat | 128K |
| GPT 5.1 Codex Max | 400K |
| GPT 5.1 Codex Mini | 400K |
| GPT 5.1 Thinking | 400K |
| GPT 5.2 | 400K |
| GPT 5.2 Chat | 128K |
| GPT 5.2 Codex | 400K |
| GPT 5.3 Codex | 400K |
| GPT 5.4 | 1.1M |
| GPT 5.4 Mini | 400K |
| GPT 5.4 Nano | 400K |
| GPT 5.4 Pro | 1.1M |
| GPT-OSS 20B | 131K |
| GPT-OSS 120B | 131K |
| GPT OSS Safeguard 20B | 131K |
| o1 | 200K |
| o3-mini | — |
| o4-mini | — |
| Model | Context |
|---|---|
| Gemini 2.0 Flash | 1M |
| Gemini 2.0 Flash-Lite | 1M |
| Gemini 2.5 Flash-Lite | 1M |
| Gemini 2.5 Flash | 1M |
| Gemini 2.5 Pro | 1M |
| Gemini 3 Flash | 1M |
| Gemini 3 Pro Preview | 1M |
| Gemini 3.1 Flash Lite Preview | 1M |
| Gemini 3.1 Pro Preview | 1M |
| Gemma 4 26B A4B IT | 262K |
| Gemma 4 31B IT | 262K |
Meta
| Model | Context |
|---|---|
| Llama 3.1 8B | 131K |
| Llama 3.1 70B | 131K |
| Llama 3.2 1B | 128K |
| Llama 3.2 3B | 128K |
| Llama 3.3 70B | 128K |
| Llama 4 Scout | 131K |
| Llama 4 Maverick | 524K |
Mistral
| Model | Context |
|---|---|
| Mistral Small | 32K |
| Mistral Medium | 128K |
| Mistral Large 3 | 256K |
| Mistral Nemo | 131K |
| Ministral 3B | 128K |
| Ministral 8B | 128K |
| Ministral 14B | 256K |
| Mixtral MoE 8x22B Instruct | 66K |
| Magistral Small | 128K |
| Magistral Medium | 128K |
| Codestral | 128K |
| Devstral 2 | 256K |
| Devstral Small | 128K |
| Devstral Small 2 | 256K |
Alibaba (Qwen)
| Model | Context |
|---|---|
| Qwen 3 14B | 41K |
| Qwen 3 30B | 41K |
| Qwen 3 32B | 131K |
| Qwen 3 235B | 131K |
| Qwen3 235B Thinking | 262K |
| Qwen3 Coder | 262K |
| Qwen3 Coder 30B | 262K |
| Qwen3 Coder Next | 256K |
| Qwen3 Next 80B | 262K |
| Qwen 3.6 Plus | 1M |
DeepSeek
| Model | Context |
|---|---|
| DeepSeek R1 | 164K |
| DeepSeek V3 | 164K |
| DeepSeek V3.1 | 164K |
| DeepSeek V3.2 | 164K |
Moonshot
| Model | Context |
|---|---|
| Kimi K2 | 131K |
| Kimi K2 Turbo | 256K |
| Kimi K2 0905 | 256K |
| Kimi K2 Thinking | 262K |
| Kimi K2 Thinking Turbo | 262K |
| Kimi K2.5 | 262K |
ZAI
| Model | Context |
|---|---|
| GLM 4.6 | 205K |
| GLM 4.7 | 205K |
| GLM 4.7 Flash | 200K |
| GLM 5 | 203K |
| GLM 5.1 | 203K |
Other Language Models
| Model | Provider | Context |
|---|---|---|
| MiniMax M2.1 | MiniMax | 205K |
| MiniMax M2.5 | MiniMax | 205K |
| Minimax M2.7 | MiniMax | 205K |
| Morph V3 Fast | Morph | 82K |
| Morph V3 Large | Morph | 82K |
| INTELLECT 3 | PrimeIntellect | 131K |
| Nemotron 3 Nano 30B | NVIDIA | 262K |
| Nemotron Nano 9B v2 | NVIDIA | 131K |
| NVIDIA Nemotron 3 Super 120B A12B | NVIDIA | 256K |
| Nova 2 Lite | Amazon | 1M |
| Nova Lite | Amazon | 300K |
| Nova Micro | Amazon | 128K |
| Nova Pro | Amazon | 300K |
Several models in this catalog also accept image input. Models with vision capability are listed on the Vision page.
Models Overview
OLLM provides access to two types of model execution environments, Trusted Execution Environments (TEE) for cryptographically verifiable private inference, and Zero Data Retention (ZDR) via Vercel for policy-based privacy with a broader model catalog.
Vision Models
Image-understanding models on OLLM that accept image input alongside text, available in both TEE and ZDR environments and used through the AI SDK chatModel() method.