Troubleshoot Website Theme Generation

Diagnose and fix common issues when implementing the website theme generation workflow using OLLM, including scraping errors, model output formatting, and API connectivity problems.

This guide covers common issues you may encounter when implementing the website theme generation workflow using OLLM.

The workflow involves:

Fetching a sitemap
Extracting relevant URLs
Scraping page content
Sending the combined content to OLLM
Parsing the model response

If any step fails, the pipeline may break or produce incomplete results. The sections below help you isolate and resolve common issues.

Sitemap Issues

Sitemap Not Found (404)

If your request to sitemap.xml fails:

response = requests.get(sitemap_url)
print(response.status_code)

Possible causes:

The website does not expose a public sitemap
The sitemap is located at a different path
The site blocks automated requests

How to fix

Check https://example.com/robots.txt to find the correct sitemap location
Verify the URL manually in your browser
Add a User-Agent header to your request:

headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(sitemap_url, headers=headers)

XML Parsing Errors

If you see errors such as:

xml.etree.ElementTree.ParseError

The sitemap may:

Be malformed
Contain namespaces not handled properly
Be compressed

How to fix

Ensure you handle namespaces correctly:

root.findall(".//{*}loc")

If the sitemap is compressed (.gz), download and decompress it before parsing.

Scraping Issues

Empty or Incomplete Content

If scrape_page() returns very little text:

The site may use client-side rendering (JavaScript)
Content may load dynamically

How to fix

For JavaScript-heavy sites, use a headless browser tool such as:

Playwright
Selenium

Basic requests + BeautifulSoup will not execute JavaScript.

Blocked Requests (403)

If you receive:

403 Forbidden

The website may be blocking automated scraping.

How to fix

Add a User-Agent header
Respect robots.txt
Avoid scraping at high request frequency

OLLM API Issues

401 Unauthorized

If you receive a 401 error from OLLM:

Verify your API key
Ensure base_url="https://api.ollm.com/v1" is set correctly
Confirm the key has not been revoked

Example client initialization:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ollm.com/v1",
    api_key="your-api-key"
)

Model Not Found

If you see an error indicating the model is unavailable:

Verify the model ID (near/GLM-4.6)
Confirm the model is available in your account
Ensure there are no typos in the model string

Empty Model Output

If response.choices[0].message.content is empty:

Confirm that the request succeeded (HTTP 200)
Check that response.choices exists
Print the full response object for debugging

Example guard:

if response and response.choices:
    print(response.choices[0].message.content)

Token and Input Size Issues

Content Too Large

If you receive token limit errors or unusually slow responses, the combined website content may exceed the model’s context window.

How to fix

Truncate large pages
Chunk content into smaller segments
Summarize sections individually before combining results

Example truncation:

MAX_CHARS = 20000
combined_content = combined_content[:MAX_CHARS]

Performance Issues

Slow Execution

The workflow may slow down due to:

Large sitemaps
Sequential page scraping
Large combined prompts

Improvements

Add concurrency when scraping
Cache scraped pages
Filter sitemap URLs more strictly

Response Parsing Errors

If your application crashes while reading:

response.choices[0].message.content

The likely cause is that the response is an error envelope rather than a completion.

Always validate before accessing fields:

if hasattr(response, "choices") and response.choices:
    content = response.choices[0].message.content

Do not attempt to parse output if an error object is present.

Unexpected Output Quality

If the generated themes are:

Too generic
Not structured
Missing important insights

You can improve results by:

Adding stronger system instructions
Asking for structured output (e.g., JSON format)
Reducing noisy scraped content (menus, footers, boilerplate)

Example improved prompt:

{
    "role": "system",
    "content": "Extract structured business themes. Return a clear bullet list grouped by category."
}

Verification & Security Context

All inference requests sent through OLLM are processed inside Trusted Execution Environments (TEEs). If you need to validate execution integrity:

Check verification metadata in the OLLM dashboard
Confirm request status is “Verified”

This does not affect the scraping workflow, but it may be relevant for audit or compliance requirements.

Debugging Checklist

Before escalating issues, verify:

Sitemap URL is correct
Relevant pages are being extracted
Scraped content is non-empty
API key is valid
Model ID is correct
Response status is 2xx
choices[0].message.content exists

This isolates failures quickly and helps determine whether the issue is in scraping logic, request construction, or response handling.

Troubleshoot Website Theme Generation

On this page