Requests & Responses
How Ollm structures requests, responses, and verification metadata.
A request is what your client sends to the API: HTTP method, endpoint, headers, and JSON payload.
A response is what comes back: either a success payload containing model output or a structured error that must be handled deterministically.
When integrating with Ollm, you are calling a Confidential AI Gateway over HTTP. From an engineering perspective, this remains a standard request/response contract. The only difference is that the payload represents a model invocation executed inside a Trusted Execution Environment (TEE).
This page focuses on the response side of that contract. The request shape is intentionally minimal and stable. What varies—and what must be handled correctly in production—is the response envelope.
The Request
A minimal chat-style request contains:
- A
modelselector (explicitly chosen by you) - A
messagesarray containing the user input
The user input you want the model to complete lives in messages[].content.
Minimal JSON Body (Language-Agnostic)
{
"model": "near/gpt-oss-120b",
"messages": [
{
"role": "user",
"content": "Why is the sky blue? (1-2 sentences)"
}
]
}Authentication, headers, timeouts, retries, and error handling depend on your stack and are covered elsewhere. This page assumes a properly authenticated request to:
POST /v1/chat/completionsThe Response Contract
Every response falls into one of two categories:
Success (2xx HTTP status)
Contains a completion payload.
Failure (4xx / 5xx HTTP status)
Contains an error envelope. You must not attempt to read model output from it.
Your production logic should branch strictly on this distinction.
Success: chat.completion
A typical successful response returns the model’s output along with usage metadata.
HTTP Status
200 OKDefault render target
choices[0].message.content
Example Response
{
"id": "chatcmpl-893c78e06a795cea",
"created": 1767623515,
"model": "openai/gpt-oss-120b",
"object": "chat.completion",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The sky appears blue because molecules in Earth's atmosphere scatter short-wavelength (blue) light from the Sun more efficiently than longer-wavelength (red) light--a phenomenon known as Rayleigh scattering. This scattered blue light reaches our eyes from all directions, giving the sky its characteristic hue.",
"role": "assistant",
"reasoning_content": "We need to answer briefly. Provide 1-2 sentences explanation: scattering of short wavelengths (Rayleigh scattering)."
},
"provider_specific_fields": {
"stop_reason": null,
"token_ids": null
}
}
],
"usage": {
"completion_tokens": 92,
"prompt_tokens": 81,
"total_tokens": 173
}
}Interpreting a Successful Response
Your responsibility is to extract the assistant output safely and record usage metadata for operational control.
Extract the assistant output
Read:
choices[0].message.content
Before rendering, ensure:
choicesexistschoices[0]existschoices[0].message.contentis present
Do not assume fields exist without validation.
Record usage metadata
Usage fields are critical for:
- Cost tracking
- Rate-limit enforcement
- Observability
Look under:
usage.prompt_tokens
usage.completion_tokens
usage.total_tokens
Common Pitfalls
Most integration bugs occur due to incorrect assumptions about response shape:
- Using
choices[0].messageinstead ofchoices[0].message.content - Assuming
choicesalways exists or is non-empty - Attempting to render output when an error envelope is present
- Ignoring token usage metadata
Your application should render model output only when:
- HTTP status is 2xx
- No
errorobject exists choices[0].message.contentis valid
Error Responses
Error responses must be handled deterministically. You must not attempt to extract model output from error payloads.
Authentication Error (401)
If your API key is invalid or missing, the gateway rejects the request before it reaches the model.
HTTP Status
401 UnauthorizedWhen this occurs, no model invocation takes place. The request is blocked at the authentication layer.
This should be treated as a hard failure. Your application should stop normal rendering and surface a clear authentication error to the user or logs.
Retrying will not resolve the issue until the underlying credentials or configuration are corrected.
Common causes include:
- Missing
Authorizationheader - Incorrect Bearer token format
- Expired or rotated API key
- Wrong environment configuration loaded
To resolve this:
- Verify that the
Authorization: Bearer YOUR_API_KEYheader is present and correct - Confirm the correct environment configuration is being used
- Rotate or recreate the API key if necessary
- Validate the request using a minimal curl example
Example 401 Response
{
"error": {
"message": "Authentication Error, Invalid proxy server token passed. Received API Key = sk-...QTPg, Key Hash (Token) =0e8b5545f7e4be9664401218a81712ccd59b094fbc1da836029560957d82cbbb. Unable to find token in cache or `LiteLLM_VerificationTokenTable`",
"type": "token_not_found_in_db",
"param": "key",
"code": "401"
}
}No choices field exists. No completion was generated.
Method Not Allowed (405)
If the wrong HTTP method is used—for example, sending GET instead of POST—the endpoint rejects the request.
HTTP Status
405 Method Not AllowedIn this case, the request never reaches the model. No completion is generated.
This typically indicates a client or proxy misconfiguration. The endpoint path may be correct, but the HTTP method is not.
Common scenarios include:
- Client code accidentally using
GET - SDK misconfiguration
- Reverse proxy rewriting request methods
- Incorrect routing configuration
To resolve the issue:
- Ensure the request method is
POST - Confirm the endpoint path is
/v1/chat/completions - Validate using a minimal curl or Postman request with a JSON body
Example 405 Response
{
"detail": "Method Not Allowed"
}No model output exists in this response.
Retry Strategy
Retries should be limited to transient failures such as:
- Network timeouts
- 5xx server errors
Do not retry automatically for:
- 401 authentication failures
- 400-level validation errors
- 405 method errors
Retries should implement:
- Exponential backoff
- Upper retry limits
- Explicit timeout control
Production Checklist
Before rendering any model output:
- Confirm HTTP status is 2xx
- Confirm no
errorobject exists - Confirm
choices[0].message.contentexists - Record
usage.total_tokens
In production systems, the core engineering task is not simply “call the model,” but to build a reliable and secure pipeline around it—including authentication, structured error handling, retries, strict response validation, and controlled rendering.