OLLM

Requests & Responses

How Ollm structures requests, responses, and verification metadata.

A request is what your client sends to the API: HTTP method, endpoint, headers, and JSON payload.

A response is what comes back: either a success payload containing model output or a structured error that must be handled deterministically.

When integrating with Ollm, you are calling a Confidential AI Gateway over HTTP. From an engineering perspective, this remains a standard request/response contract. The only difference is that the payload represents a model invocation executed inside a Trusted Execution Environment (TEE).

This page focuses on the response side of that contract. The request shape is intentionally minimal and stable. What varies—and what must be handled correctly in production—is the response envelope.

The Request

A minimal chat-style request contains:

  • A model selector (explicitly chosen by you)
  • A messages array containing the user input

The user input you want the model to complete lives in messages[].content.

Minimal JSON Body (Language-Agnostic)

request-body.json
{
  "model": "near/gpt-oss-120b",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue? (1-2 sentences)"
    }
  ]
}

Authentication, headers, timeouts, retries, and error handling depend on your stack and are covered elsewhere. This page assumes a properly authenticated request to:

API Endpoint
POST /v1/chat/completions

The Response Contract

Every response falls into one of two categories:

Success (2xx HTTP status)

Contains a completion payload.

Failure (4xx / 5xx HTTP status)

Contains an error envelope. You must not attempt to read model output from it.

Your production logic should branch strictly on this distinction.

Success: chat.completion

A typical successful response returns the model’s output along with usage metadata.

HTTP Status

Response (200 OK)
200 OK

Default render target

choices[0].message.content

Example Response

Success Response (200 OK)
{
  "id": "chatcmpl-893c78e06a795cea",
  "created": 1767623515,
  "model": "openai/gpt-oss-120b",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The sky appears blue because molecules in Earth's atmosphere scatter short-wavelength (blue) light from the Sun more efficiently than longer-wavelength (red) light--a phenomenon known as Rayleigh scattering. This scattered blue light reaches our eyes from all directions, giving the sky its characteristic hue.",
        "role": "assistant",
        "reasoning_content": "We need to answer briefly. Provide 1-2 sentences explanation: scattering of short wavelengths (Rayleigh scattering)."
      },
      "provider_specific_fields": {
        "stop_reason": null,
        "token_ids": null
      }
    }
  ],
  "usage": {
    "completion_tokens": 92,
    "prompt_tokens": 81,
    "total_tokens": 173
  }
}

Interpreting a Successful Response

Your responsibility is to extract the assistant output safely and record usage metadata for operational control.

Extract the assistant output

Read:

choices[0].message.content

Before rendering, ensure:

  • choices exists
  • choices[0] exists
  • choices[0].message.content is present

Do not assume fields exist without validation.

Record usage metadata

Usage fields are critical for:

  • Cost tracking
  • Rate-limit enforcement
  • Observability

Look under:

usage.prompt_tokens

usage.completion_tokens

usage.total_tokens

Common Pitfalls

Most integration bugs occur due to incorrect assumptions about response shape:

  • Using choices[0].message instead of choices[0].message.content
  • Assuming choices always exists or is non-empty
  • Attempting to render output when an error envelope is present
  • Ignoring token usage metadata

Your application should render model output only when:

  • HTTP status is 2xx
  • No error object exists
  • choices[0].message.content is valid

Error Responses

Error responses must be handled deterministically. You must not attempt to extract model output from error payloads.

Authentication Error (401)

If your API key is invalid or missing, the gateway rejects the request before it reaches the model.

HTTP Status

Error Response (401 Unauthorized)
401 Unauthorized

When this occurs, no model invocation takes place. The request is blocked at the authentication layer.

This should be treated as a hard failure. Your application should stop normal rendering and surface a clear authentication error to the user or logs.

Retrying will not resolve the issue until the underlying credentials or configuration are corrected.

Common causes include:

  • Missing Authorization header
  • Incorrect Bearer token format
  • Expired or rotated API key
  • Wrong environment configuration loaded

To resolve this:

  • Verify that the Authorization: Bearer YOUR_API_KEY header is present and correct
  • Confirm the correct environment configuration is being used
  • Rotate or recreate the API key if necessary
  • Validate the request using a minimal curl example

Example 401 Response

Error Response (401 Unauthorized)
{
  "error": {
    "message": "Authentication Error, Invalid proxy server token passed. Received API Key = sk-...QTPg, Key Hash (Token) =0e8b5545f7e4be9664401218a81712ccd59b094fbc1da836029560957d82cbbb. Unable to find token in cache or `LiteLLM_VerificationTokenTable`",
    "type": "token_not_found_in_db",
    "param": "key",
    "code": "401"
  }
}

No choices field exists. No completion was generated.

Method Not Allowed (405)

If the wrong HTTP method is used—for example, sending GET instead of POST—the endpoint rejects the request.

HTTP Status

Error Response (405 Method Not Allowed)
405 Method Not Allowed

In this case, the request never reaches the model. No completion is generated.

This typically indicates a client or proxy misconfiguration. The endpoint path may be correct, but the HTTP method is not.

Common scenarios include:

  • Client code accidentally using GET
  • SDK misconfiguration
  • Reverse proxy rewriting request methods
  • Incorrect routing configuration

To resolve the issue:

  • Ensure the request method is POST
  • Confirm the endpoint path is /v1/chat/completions
  • Validate using a minimal curl or Postman request with a JSON body

Example 405 Response

Error Response (405 Method Not Allowed)
{
  "detail": "Method Not Allowed"
}

No model output exists in this response.

Retry Strategy

Retries should be limited to transient failures such as:

  • Network timeouts
  • 5xx server errors

Do not retry automatically for:

  • 401 authentication failures
  • 400-level validation errors
  • 405 method errors

Retries should implement:

  • Exponential backoff
  • Upper retry limits
  • Explicit timeout control

Production Checklist

Before rendering any model output:

  • Confirm HTTP status is 2xx
  • Confirm no error object exists
  • Confirm choices[0].message.content exists
  • Record usage.total_tokens

In production systems, the core engineering task is not simply “call the model,” but to build a reliable and secure pipeline around it—including authentication, structured error handling, retries, strict response validation, and controlled rendering.

On this page