OLLM Architecture
How Ollm delivers verifiable, confidential LLM inference with user-controlled model selection.
Ollm’s architecture is designed to provide verifiable, confidential LLM inference while keeping control firmly with the user. It separates request orchestration, secure execution, and verification, ensuring that sensitive data is processed only inside hardware-enforced trust boundaries.
Ollm does not perform automatic model selection or dynamic routing. The model specified in the user's request is executed.
High-level components
Client application
Your application sends requests using an OpenAI-compatible API, explicitly specifying the model to use.
The client:
- Selects the model in code or request parameters
- Sends prompts and inference parameters
- Receives model responses
- Receives attestation and verification metadata for the same request
Ollm does not modify, override, or substitute the requested model.
Ollm router (control plane)
The Ollm router acts as a secure orchestration layer, responsible for:
- Authenticating requests
- Validating model availability and permissions
- Enforcing security and execution constraints
- Coordinating attestation and verification data
The router does not choose models, does not inspect prompt or response data, and does not perform inference.
Trusted execution environments (data plane)
All inference runs inside hardware-backed Trusted Execution Environments (TEEs) provided by Ollm’s supported LLM providers.
Key properties:
- Hardware-enforced memory isolation
- Protection from host OS, hypervisor, and infrastructure access
- Encryption of data while in use
- Execution integrity enforced by the platform
Depending on the selected model, execution occurs using technologies such as:
- Intel TDX–based confidential virtual machines
- NVIDIA GPU attestation for secure GPU-based inference
Only models that support TEE-backed execution are exposed through Ollm.
Attestation and verification layer
For every request, the execution environment produces attestation artifacts that prove:
- The specified model ran inside a valid TEE
- The execution environment matched expected measurements
- The response was generated within the trusted boundary
These artifacts are returned with the response, enabling independent verification of secure execution on a per-request basis.
Request lifecycle
The client sends a request to Ollm, explicitly specifying the model to use.
Ollm authenticates the request and verifies that the specified model is available and supported.
The request is forwarded to the selected model’s TEE-backed execution environment.
Hardware attestation data is generated as part of the execution process.
The model output and corresponding verification metadata are returned to the client.
At no point does Ollm alter the model choice or access plaintext prompt or response data outside the TEE.
Trust boundaries and guarantees
Ollm makes its trust model explicit:
- Model choice is always user-controlled Ollm does not perform automatic routing or model substitution.
- Ollm does not access inference data Prompts and outputs remain confined to hardware-isolated environments.
- Security is enforced by hardware, not policy Trust is anchored in TEE guarantees and cryptographic attestation.
This architecture allows teams to run sensitive LLM workloads with full control over model selection, while still gaining verifiable privacy and execution integrity.