Troubleshoot OLLM OpenAI SDK Integration

Diagnose and fix common issues when integrating OLLM with the OpenAI SDK, including connection errors, authentication failures, and response handling problems.

This guide helps you diagnose and resolve common issues when integrating OLLM with the official OpenAI SDK.

If your requests fail, return unexpected responses, or behave inconsistently, use the sections below to isolate the problem.

Client Configuration Issues

Incorrect `base_url`

If requests fail immediately or appear to route to OpenAI instead of OLLM, verify that your client is initialized correctly.

It must be:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key="your-api-key"
)

Common mistakes include:

Omitting base_url
Adding a trailing endpoint such as /chat/completions
Using https://api.ollm.com without /v1

The correct base URL is:

https://api.ollm.com/v1

API Key Not Loaded

If you receive authentication errors or requests fail silently, verify that your API key is being passed correctly.

If using environment variables:

echo $OLLM_API_KEY

If nothing prints, the variable is not set.

Ensure your client uses:

import os

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key=os.environ["OLLM_API_KEY"]
)

If the environment variable is misconfigured, the SDK will send an empty or invalid key.

Authentication Errors (401)

If you receive:

401 Unauthorized

The request reached OLLM but was rejected.

Common causes:

Invalid API key
Revoked or rotated key
Incorrect base_url
Passing the wrong environment variable

Resolution Steps

Regenerate the API key in the OLLM dashboard.
Confirm the base_url is correct.
Restart your application after updating environment variables.

Do not retry repeatedly without correcting credentials.

Model Errors

Model Not Found

If the SDK returns a model-related error:

Verify the model ID is correct (e.g., near/GLM-4.6)
Ensure the model is available in your OLLM account
Check for typos or case mismatches

Example of correct usage:

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[{"role": "user", "content": "Test"}]
)

If the model ID is invalid, the request will fail even if authentication succeeds.

Response Handling Issues

Attribute Errors When Accessing Response

If your code crashes at:

response.choices[0].message.content

The likely causes are:

The response contains an error envelope
choices is empty
The request failed but you are not checking status

Always guard response access:

if hasattr(response, "choices") and response.choices:
    print(response.choices[0].message.content)

Do not assume the response always contains a valid completion.

Empty or Unexpected Output

If the response returns successfully but content is empty:

Confirm the model received meaningful input
Log the full response object for inspection
Check token usage

print(response)
print(response.usage.total_tokens)

If token usage is unusually low, the prompt may be malformed.

Streaming Issues

If streaming responses fail or produce no output:

stream = client.chat.completions.create(..., stream=True)

Check:

The model supports streaming
You are iterating correctly over chunks
You are checking for delta before accessing content

Example guard:

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta:
        print(chunk.choices[0].delta.get("content", ""), end="")

If streaming hangs, test the same request without stream=True to isolate whether the issue is streaming-specific.

Token and Context Errors

If you encounter context length or token limit errors:

Your prompt may exceed the model’s context window
Large inputs may need truncation or chunking

Example safeguard:

MAX_CHARS = 20000
prompt = prompt[:MAX_CHARS]

Monitor:

response.usage.total_tokens

Excessive token usage may also increase latency.

Network or Timeout Errors

If requests time out or fail intermittently:

Check network connectivity
Add timeout controls
Retry only on transient failures

Example with basic timeout:

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[{"role": "user", "content": "Test"}],
    timeout=30
)

Avoid retrying 401 or model-not-found errors.

Verification & Dashboard Cross-Check

If you are unsure whether the request reached OLLM:

Check the OLLM dashboard
Confirm the request appears in logs
Verify status (Success / Failed / Verified)

If no request appears in the dashboard, the issue is likely local configuration.

Debugging Checklist

Before escalating issues, confirm:

base_url is exactly https://api.ollm.com/v1
The API key is valid and loaded
The model ID is correct
The request is reaching OLLM (visible in dashboard)
You are guarding response parsing
You are not exceeding token limits

Working through these checks isolates most integration problems quickly.

Troubleshoot OLLM OpenAI SDK Integration

On this page