Migrate to OLLM from OpenAI Apps

Migrate your existing OpenAI application to OLLM with minimal code changes. Configure the OpenAI SDK to route requests through OLLM for hardware-attested, confidential inference.

This guide explains how to use OLLM with the official OpenAI SDK.

Because OLLM exposes an OpenAI-compatible API, you can integrate it using the same SDK you would use for OpenAI, by changing only the base_url and api_key.

Prerequisites

An OLLM account
An OLLM API key
Python 3.8+ (for the examples below)

Install the official OpenAI SDK:

pip install openai

Basic Configuration

To use OLLM, initialize the OpenAI client with:

base_url="https://api.ollm.com/v1"
api_key="your-ollm-api-key"

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key="your-api-key"
)

No additional configuration is required.

Make a Chat Completion Request

You must explicitly specify the model in each request.

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"}
    ]
)

print(response.choices[0].message.content)

The response format follows the OpenAI-compatible schema.

Using System Messages

You can provide system instructions in the same way as standard OpenAI requests.

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "Explain TLS in two sentences."}
    ]
)

print(response.choices[0].message.content)

Handling the Response Safely

In production systems, always validate the response structure before rendering output.

if response and response.choices:
    content = response.choices[0].message.content
    print(content)

You can also access usage metadata for cost tracking:

print(response.usage.total_tokens)

Streaming Responses

If you want to stream partial results:

stream = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[{"role": "user", "content": "Write a short paragraph about secure AI."}],
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta:
        print(chunk.choices[0].delta.get("content", ""), end="")

Streaming works the same way as with OpenAI’s API.

Switching Models

To use a different model, change the model parameter:

response = client.chat.completions.create(
    model="near/GLM-4.7",
    messages=[{"role": "user", "content": "Summarize the concept of confidential computing."}]
)

Ensure that the model ID is available in your OLLM account.

Environment Variable Configuration (Recommended)

Instead of hardcoding your API key:

export OLLM_API_KEY="your-api-key"

Then initialize the client:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key=os.environ["OLLM_API_KEY"]
)

This prevents accidental key exposure in source code.

Common Errors

401 Unauthorized

If you receive a 401 response:

Verify your API key
Confirm base_url is set to https://api.ollm.com/v1
Ensure the key has not been revoked

Model Not Found

If the request fails due to model errors:

Verify the model ID is correct
Ensure the model is available in your account

Migrate to OLLM from OpenAI Apps

On this page