Audio Models
Speech-to-text transcription models on OLLM, running in Trusted Execution Environments and used through the AI SDK transcriptionModel() method.
Audio models on OLLM provide speech-to-text transcription: they take an audio recording and return a written transcript.
When to Use
- Transcribing meetings, calls, voice notes, and interviews
- Generating captions or subtitles
- Feeding spoken input into a text or RAG pipeline
- Segment-level timestamps for navigation or alignment
Transcription returns text from audio. There is no text-to-speech (TTS) capability, the OLLM catalog has no models that produce audio output.
AI SDK Method
Audio models are accessed with transcriptionModel() and used with the AI SDK's experimental_transcribe helper:
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'node:fs/promises';
import { createOLLM } from '@orgn/gateway';
const ollm = createOLLM({ apiKey: process.env.OLLM_API_KEY });
const audio = await readFile('meeting.mp3');
const result = await transcribe({
model: ollm.transcriptionModel('near_whisper_large_v3'),
audio,
});
console.log(result.text); // full transcript
console.log(result.language); // e.g. "english"
console.log(result.durationInSeconds); // total audio length
for (const seg of result.segments) {
console.log(`[${seg.startSecond}s – ${seg.endSecond}s] ${seg.text}`);
}Whisper-specific options (language, temperature, prompt) can be passed through providerOptions.ollm. See the Vercel AI SDK integration for details.
Supported audio types: audio/mpeg, audio/wav, audio/mp4 (m4a), audio/webm, audio/flac, audio/ogg.
Discover available transcription models with ollm.listModels({ inputModality: 'audio' }).
TEE Catalog
Audio models running in Trusted Execution Environments, on NEAR infrastructure with Intel TDX + NVIDIA H100 confidential compute. Every request produces a cryptographic attestation receipt.
| Model | Provider | Infrastructure | Context |
|---|---|---|---|
| Whisper Large V3 | OpenAI | near | — |
ZDR Catalog
There are currently no speech-to-text models in the ZDR catalog. Transcription on OLLM runs exclusively in TEE environments.
Embedding & Reranking Models
Embedding and reranking models on OLLM for semantic search and RAG, available in TEE and ZDR environments and used through the AI SDK embeddingModel() method.
Image & Video Models
Image and video generation models on OLLM, reachable through the OpenAI-compatible API but not wired into the AI SDK provider interface.