Audio Models

Speech-to-text transcription models on OLLM, running in Trusted Execution Environments and used through the AI SDK transcriptionModel() method.

Audio models on OLLM provide speech-to-text transcription: they take an audio recording and return a written transcript.

When to Use

Transcribing meetings, calls, voice notes, and interviews
Generating captions or subtitles
Feeding spoken input into a text or RAG pipeline
Segment-level timestamps for navigation or alignment

Transcription returns text from audio. There is no text-to-speech (TTS) capability, the OLLM catalog has no models that produce audio output.

AI SDK Method

Audio models are accessed with transcriptionModel() and used with the AI SDK's experimental_transcribe helper:

transcription-model.ts

import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'node:fs/promises';
import { createOLLM } from '@orgn/gateway';

const ollm = createOLLM({ apiKey: process.env.OLLM_API_KEY });
const audio = await readFile('meeting.mp3');

const result = await transcribe({
  model: ollm.transcriptionModel('near_whisper_large_v3'),
  audio,
});

console.log(result.text);                // full transcript
console.log(result.language);            // e.g. "english"
console.log(result.durationInSeconds);   // total audio length
for (const seg of result.segments) {
  console.log(`[${seg.startSecond}s – ${seg.endSecond}s] ${seg.text}`);
}

Whisper-specific options (language, temperature, prompt) can be passed through providerOptions.ollm. See the Vercel AI SDK integration for details.

Supported audio types: audio/mpeg, audio/wav, audio/mp4 (m4a), audio/webm, audio/flac, audio/ogg.

Discover available transcription models with ollm.listModels({ inputModality: 'audio' }).

TEE Catalog

Audio models running in Trusted Execution Environments, on NEAR infrastructure with Intel TDX + NVIDIA H100 confidential compute. Every request produces a cryptographic attestation receipt.

Model	Provider	Infrastructure	Context
Whisper Large V3	OpenAI	near	—

ZDR Catalog

There are currently no speech-to-text models in the ZDR catalog. Transcription on OLLM runs exclusively in TEE environments.

Audio Models

When to Use

AI SDK Method

TEE Catalog

ZDR Catalog

On this page