Image & Video Models
Image and video generation models on OLLM, reachable through the OpenAI-compatible API but not wired into the AI SDK provider interface.
Image and video generation models produce visual media from text prompts (and, for some models, from reference images or video).
When to Use
- Image generation: creating, editing, or inpainting images from a text prompt
- Video generation: text-to-video, image-to-video, and motion-controlled clips
To understand an existing image rather than generate one, use a Vision model instead.
How to Access
Image and video models are not available through the AI SDK provider. Calling ollm.imageModel() throws a NoSuchModelError, and there is no AI SDK helper for video generation.
Image-output and video-output models are reachable through the OpenAI-compatible OLLM API over raw HTTP. They appear in ollm.listModels() results (for example with 'image' in output_modalities), so you can discover IDs at runtime, but the request itself must be made directly against the gateway endpoint rather than through generateText or streamText.
TEE Catalog
Image generation models running in Trusted Execution Environments, on NEAR infrastructure with Intel TDX + NVIDIA H100 confidential compute.
| Model | Provider | Infrastructure |
|---|---|---|
| Flux.2 Klein 4B | BFL | near |
There are currently no video generation models in the TEE catalog.
ZDR Catalog
Image and video generation models running on Vercel's AI infrastructure with zero data retention provider agreements.
Image Generation
| Model | Provider |
|---|---|
| Flux Schnell | BFL |
| FLUX.1 Fill [pro] | BFL |
| FLUX.1 Kontext Max | BFL |
| FLUX.1 Kontext Pro | BFL |
| FLUX.2 [flex] | BFL |
| FLUX.2 [klein] 4B | BFL |
| FLUX.2 [klein] 9B | BFL |
| FLUX.2 [max] | BFL |
| FLUX.2 [pro] | BFL |
| FLUX1.1 [pro] | BFL |
| FLUX1.1 [pro] Ultra | BFL |
| GPT Image 1 | OpenAI |
| GPT Image 1 Mini | OpenAI |
| GPT Image 1.5 | OpenAI |
| GPT Image 2 | OpenAI |
| Imagen 4 | |
| Imagen 4 Fast | |
| Imagen 4 Ultra | |
| Grok Imagine | xAI |
| Grok Imagine Image | xAI |
| Grok Imagine Image Pro | xAI |
| Recraft V2 | Recraft |
| Recraft V3 | Recraft |
| Recraft V4 | Recraft |
| Recraft V4 Pro | Recraft |
| Seedream 4.0 | ByteDance |
| Seedream 4.5 | ByteDance |
| Seedream 5.0 Lite | ByteDance |
Several Google Gemini models also produce image output (for example Gemini 3 Pro Image, Gemini 3.1 Flash Image Preview, and Nano Banana).
Video Generation
| Model | Provider |
|---|---|
| Veo 3.0 | |
| Veo 3.0 Fast Generate | |
| Veo 3.1 | |
| Veo 3.1 Fast Generate | |
| Kling v2.5 Turbo Image-to-Video | Kuaishou |
| Kling v2.5 Turbo Text-to-Video | Kuaishou |
| Kling v2.6 Image-to-Video | Kuaishou |
| Kling v2.6 Motion Control | Kuaishou |
| Kling v2.6 Text-to-Video | Kuaishou |
| Kling v3.0 Image-to-Video | Kuaishou |
| Kling v3.0 Text-to-Video | Kuaishou |
| Seedance 2.0 | ByteDance |
| Seedance 2.0 Fast | ByteDance |
| Seedance v1.0 Lite Image-to-Video | ByteDance |
| Seedance v1.0 Lite Text-to-Video | ByteDance |
| Seedance v1.0 Pro | ByteDance |
| Seedance v1.0 Pro Fast | ByteDance |
| Seedance v1.5 Pro | ByteDance |
| Wan v2.5 Text-to-Video Preview | Alibaba |
| Wan v2.6 Image-to-Video | Alibaba |
| Wan v2.6 Image-to-Video Flash | Alibaba |
| Wan v2.6 Reference-to-Video Flash | Alibaba |
| Wan v2.6 Text-to-Video | Alibaba |
Audio Models
Speech-to-text transcription models on OLLM, running in Trusted Execution Environments and used through the AI SDK transcriptionModel() method.
OLLM API Requests and Response Structure
How OLLM structures API requests and responses for confidential LLM inference. Covers the response envelope, success and error handling, usage metadata, attestation data, and production integration patterns.