OLLM Architecture and Confidential Inference Flow
A deep dive into the OLLM architecture, including the router control plane, hardware-backed Trusted Execution Environments, and the attestation layer. Learn how user-controlled model selection, Intel TDX, and NVIDIA GPU attestation deliver verifiable confidential LLM inference end to end.
OLLM’s architecture is designed to provide verifiable, confidential LLM inference while keeping control firmly with the user. It separates request orchestration, secure execution, and verification, ensuring that sensitive data is processed only inside hardware-enforced trust boundaries.
OLLM does not perform automatic model selection or dynamic routing. The model specified in the user's request is executed.
High-level components
Client application
Your application sends requests using an OpenAI-compatible API, explicitly specifying the model to use.
The client:
- Selects the model in code or request parameters
- Sends prompts and inference parameters
- Receives model responses
- Receives attestation and verification metadata for the same request
OLLM does not modify, override, or substitute the requested model.
OLLM router (control plane)
The OLLM router acts as a secure orchestration layer, responsible for:
- Authenticating requests
- Validating model availability and permissions
- Enforcing security and execution constraints
- Coordinating attestation and verification data
The router does not choose models, does not inspect prompt or response data, and does not perform inference.
Execution environments (data plane)
OLLM routes requests to one of two execution environments depending on the model selected.
TEE models (NEAR and Phala infrastructure) run inside hardware-backed Trusted Execution Environments:
- Hardware-enforced memory isolation from host OS, hypervisor, and infrastructure
- Encryption of data in use via Intel TDX confidential VMs and NVIDIA H100 GPU attestation
- Cryptographic attestation receipt generated per request
ZDR models (Vercel infrastructure) run under zero data retention provider agreements:
- No storage or logging of prompts and responses by Vercel or the underlying model provider
- Broad access to frontier closed-weight and multimodal models
- No hardware isolation or attestation receipt
The model identifier you specify in the request determines which environment is used. OLLM does not select or override the execution path.
Attestation and verification layer
For TEE model requests, the execution environment produces attestation artifacts that prove:
- The specified model ran inside a valid TEE
- The execution environment matched expected measurements
- The response was generated within the trusted boundary
These artifacts are available per request in the OLLM Explorer, enabling independent verification of secure execution.
ZDR model requests do not produce attestation artifacts. Privacy is enforced through Vercel's zero data retention agreements with model providers.
Request lifecycle
The client sends a request to OLLM, explicitly specifying the model to use.
OLLM authenticates the request and verifies that the specified model is available and supported.
The request is forwarded to the selected model’s TEE-backed execution environment.
Hardware attestation data is generated as part of the execution process.
The model output and corresponding verification metadata are returned to the client.
At no point does OLLM alter the model choice or access plaintext prompt or response data outside the TEE.
Trust boundaries and guarantees
OLLM makes its trust model explicit:
- Model choice is always user-controlled OLLM does not perform automatic routing or model substitution.
- OLLM does not access inference data Prompts and outputs remain confined to hardware-isolated environments.
- Security enforcement depends on model type TEE models: trust anchored in hardware TEE guarantees and cryptographic attestation. ZDR models: trust anchored in Vercel's zero data retention provider agreements.
This architecture allows teams to run sensitive LLM workloads with full control over model selection, while still gaining verifiable privacy and execution integrity.
What is OLLM - Confidential AI Gateway
OLLM is an enterprise confidential AI gateway that routes LLM requests through hardware-backed Trusted Execution Environments, providing cryptographic proof of privacy per request.
Quickstart Guide for OLLM Integration
Get started with OLLM in minutes. Send your first verified inference request through a Trusted Execution Environment and inspect the attestation receipt.