Use CasesOpenAI-SDK

Troubleshoot

Common issues and solutions when integrating OLLM with the OpenAI SDK.

This guide helps you diagnose and resolve common issues when integrating OLLM with the official OpenAI SDK.

If your requests fail, return unexpected responses, or behave inconsistently, use the sections below to isolate the problem.

Client Configuration Issues

Incorrect base_url

If requests fail immediately or appear to route to OpenAI instead of OLLM, verify that your client is initialized correctly.

It must be:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key="your-api-key"
)

Common mistakes include:

  • Omitting base_url
  • Adding a trailing endpoint such as /chat/completions
  • Using https://api.ollm.com without /v1

The correct base URL is:

https://api.ollm.com/v1

API Key Not Loaded

If you receive authentication errors or requests fail silently, verify that your API key is being passed correctly.

If using environment variables:

echo $OLLM_API_KEY

If nothing prints, the variable is not set.

Ensure your client uses:

import os

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key=os.environ["OLLM_API_KEY"]
)

If the environment variable is misconfigured, the SDK will send an empty or invalid key.

Authentication Errors (401)

If you receive:

401 Unauthorized

The request reached OLLM but was rejected.

Common causes:

  • Invalid API key
  • Revoked or rotated key
  • Incorrect base_url
  • Passing the wrong environment variable

Resolution Steps

  1. Regenerate the API key in the OLLM dashboard.
  2. Confirm the base_url is correct.
  3. Restart your application after updating environment variables.

Do not retry repeatedly without correcting credentials.

Model Errors

Model Not Found

If the SDK returns a model-related error:

  • Verify the model ID is correct (e.g., near/GLM-4.6)
  • Ensure the model is available in your OLLM account
  • Check for typos or case mismatches

Example of correct usage:

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[{"role": "user", "content": "Test"}]
)

If the model ID is invalid, the request will fail even if authentication succeeds.

Response Handling Issues

Attribute Errors When Accessing Response

If your code crashes at:

response.choices[0].message.content

The likely causes are:

  • The response contains an error envelope
  • choices is empty
  • The request failed but you are not checking status

Always guard response access:

if hasattr(response, "choices") and response.choices:
    print(response.choices[0].message.content)

Do not assume the response always contains a valid completion.

Empty or Unexpected Output

If the response returns successfully but content is empty:

  • Confirm the model received meaningful input
  • Log the full response object for inspection
  • Check token usage
print(response)
print(response.usage.total_tokens)

If token usage is unusually low, the prompt may be malformed.

Streaming Issues

If streaming responses fail or produce no output:

stream = client.chat.completions.create(..., stream=True)

Check:

  • The model supports streaming
  • You are iterating correctly over chunks
  • You are checking for delta before accessing content

Example guard:

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta:
        print(chunk.choices[0].delta.get("content", ""), end="")

If streaming hangs, test the same request without stream=True to isolate whether the issue is streaming-specific.

Token and Context Errors

If you encounter context length or token limit errors:

  • Your prompt may exceed the model’s context window
  • Large inputs may need truncation or chunking

Example safeguard:

MAX_CHARS = 20000
prompt = prompt[:MAX_CHARS]

Monitor:

response.usage.total_tokens

Excessive token usage may also increase latency.

Network or Timeout Errors

If requests time out or fail intermittently:

  • Check network connectivity
  • Add timeout controls
  • Retry only on transient failures

Example with basic timeout:

response = client.chat.completions.create(
    model="near/GLM-4.6",
    messages=[{"role": "user", "content": "Test"}],
    timeout=30
)

Avoid retrying 401 or model-not-found errors.

Verification & Dashboard Cross-Check

If you are unsure whether the request reached OLLM:

  • Check the OLLM dashboard
  • Confirm the request appears in logs
  • Verify status (Success / Failed / Verified)

If no request appears in the dashboard, the issue is likely local configuration.

Debugging Checklist

Before escalating issues, confirm:

  • base_url is exactly https://api.ollm.com/v1
  • The API key is valid and loaded
  • The model ID is correct
  • The request is reaching OLLM (visible in dashboard)
  • You are guarding response parsing
  • You are not exceeding token limits

Working through these checks isolates most integration problems quickly.

On this page