Troubleshoot
Common issues and solutions when implementing the website theme generation workflow using OLLM.
This guide covers common issues you may encounter when implementing the website theme generation workflow using OLLM.
The workflow involves:
- Fetching a sitemap
- Extracting relevant URLs
- Scraping page content
- Sending the combined content to OLLM
- Parsing the model response
If any step fails, the pipeline may break or produce incomplete results. The sections below help you isolate and resolve common issues.
Sitemap Issues
Sitemap Not Found (404)
If your request to sitemap.xml fails:
response = requests.get(sitemap_url)
print(response.status_code)Possible causes:
- The website does not expose a public sitemap
- The sitemap is located at a different path
- The site blocks automated requests
How to fix
- Check
https://example.com/robots.txtto find the correct sitemap location - Verify the URL manually in your browser
- Add a User-Agent header to your request:
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(sitemap_url, headers=headers)XML Parsing Errors
If you see errors such as:
xml.etree.ElementTree.ParseErrorThe sitemap may:
- Be malformed
- Contain namespaces not handled properly
- Be compressed
How to fix
Ensure you handle namespaces correctly:
root.findall(".//{*}loc")If the sitemap is compressed (.gz), download and decompress it before parsing.
Scraping Issues
Empty or Incomplete Content
If scrape_page() returns very little text:
- The site may use client-side rendering (JavaScript)
- Content may load dynamically
How to fix
For JavaScript-heavy sites, use a headless browser tool such as:
- Playwright
- Selenium
Basic requests + BeautifulSoup will not execute JavaScript.
Blocked Requests (403)
If you receive:
403 ForbiddenThe website may be blocking automated scraping.
How to fix
- Add a User-Agent header
- Respect robots.txt
- Avoid scraping at high request frequency
OLLM API Issues
401 Unauthorized
If you receive a 401 error from OLLM:
- Verify your API key
- Ensure
base_url="https://api.ollm.com/v1"is set correctly - Confirm the key has not been revoked
Example client initialization:
from openai import OpenAI
client = OpenAI(
base_url="https://api.ollm.com/v1",
api_key="your-api-key"
)Model Not Found
If you see an error indicating the model is unavailable:
- Verify the model ID (
near/GLM-4.6) - Confirm the model is available in your account
- Ensure there are no typos in the model string
Empty Model Output
If response.choices[0].message.content is empty:
- Confirm that the request succeeded (HTTP 200)
- Check that
response.choicesexists - Print the full response object for debugging
Example guard:
if response and response.choices:
print(response.choices[0].message.content)Token and Input Size Issues
Content Too Large
If you receive token limit errors or unusually slow responses, the combined website content may exceed the model’s context window.
How to fix
- Truncate large pages
- Chunk content into smaller segments
- Summarize sections individually before combining results
Example truncation:
MAX_CHARS = 20000
combined_content = combined_content[:MAX_CHARS]Performance Issues
Slow Execution
The workflow may slow down due to:
- Large sitemaps
- Sequential page scraping
- Large combined prompts
Improvements
- Add concurrency when scraping
- Cache scraped pages
- Filter sitemap URLs more strictly
Response Parsing Errors
If your application crashes while reading:
response.choices[0].message.contentThe likely cause is that the response is an error envelope rather than a completion.
Always validate before accessing fields:
if hasattr(response, "choices") and response.choices:
content = response.choices[0].message.contentDo not attempt to parse output if an error object is present.
Unexpected Output Quality
If the generated themes are:
- Too generic
- Not structured
- Missing important insights
You can improve results by:
- Adding stronger system instructions
- Asking for structured output (e.g., JSON format)
- Reducing noisy scraped content (menus, footers, boilerplate)
Example improved prompt:
{
"role": "system",
"content": "Extract structured business themes. Return a clear bullet list grouped by category."
}Verification & Security Context
All inference requests sent through OLLM are processed inside Trusted Execution Environments (TEEs). If you need to validate execution integrity:
- Check verification metadata in the OLLM dashboard
- Confirm request status is “Verified”
This does not affect the scraping workflow, but it may be relevant for audit or compliance requirements.
Debugging Checklist
Before escalating issues, verify:
- Sitemap URL is correct
- Relevant pages are being extracted
- Scraped content is non-empty
- API key is valid
- Model ID is correct
- Response status is 2xx
choices[0].message.contentexists
This isolates failures quickly and helps determine whether the issue is in scraping logic, request construction, or response handling.