Getting Started

OLLM Architecture

How Ollm delivers verifiable, confidential LLM inference with user-controlled model selection.

Ollm’s architecture is designed to provide verifiable, confidential LLM inference while keeping control firmly with the user. It separates request orchestration, secure execution, and verification, ensuring that sensitive data is processed only inside hardware-enforced trust boundaries.

Ollm does not perform automatic model selection or dynamic routing. The model specified in the user's request is executed.

High-level components

Client application

Your application sends requests using an OpenAI-compatible API, explicitly specifying the model to use.

The client:

  • Selects the model in code or request parameters
  • Sends prompts and inference parameters
  • Receives model responses
  • Receives attestation and verification metadata for the same request

Ollm does not modify, override, or substitute the requested model.

Ollm router (control plane)

The Ollm router acts as a secure orchestration layer, responsible for:

  • Authenticating requests
  • Validating model availability and permissions
  • Enforcing security and execution constraints
  • Coordinating attestation and verification data

The router does not choose models, does not inspect prompt or response data, and does not perform inference.

Trusted execution environments (data plane)

All inference runs inside hardware-backed Trusted Execution Environments (TEEs) provided by Ollm’s supported LLM providers.

Key properties:

  • Hardware-enforced memory isolation
  • Protection from host OS, hypervisor, and infrastructure access
  • Encryption of data while in use
  • Execution integrity enforced by the platform

Depending on the selected model, execution occurs using technologies such as:

  • Intel TDX–based confidential virtual machines
  • NVIDIA GPU attestation for secure GPU-based inference

Only models that support TEE-backed execution are exposed through Ollm.

Attestation and verification layer

For every request, the execution environment produces attestation artifacts that prove:

  • The specified model ran inside a valid TEE
  • The execution environment matched expected measurements
  • The response was generated within the trusted boundary

These artifacts are returned with the response, enabling independent verification of secure execution on a per-request basis.

Request lifecycle

Request submission

The client sends a request to Ollm, explicitly specifying the model to use.

Request validation

Ollm authenticates the request and verifies that the specified model is available and supported.

Secure inference execution

The request is forwarded to the selected model’s TEE-backed execution environment.

Attestation generation

Hardware attestation data is generated as part of the execution process.

Response and verification delivery

The model output and corresponding verification metadata are returned to the client.

At no point does Ollm alter the model choice or access plaintext prompt or response data outside the TEE.

Trust boundaries and guarantees

Ollm makes its trust model explicit:

  • Model choice is always user-controlled Ollm does not perform automatic routing or model substitution.
  • Ollm does not access inference data Prompts and outputs remain confined to hardware-isolated environments.
  • Security is enforced by hardware, not policy Trust is anchored in TEE guarantees and cryptographic attestation.

This architecture allows teams to run sensitive LLM workloads with full control over model selection, while still gaining verifiable privacy and execution integrity.

On this page