Use Cases

Generate Themes from a Website

Example workflow for scraping a website and generating themes using OLLM.

In this example, we build a simple workflow that analyzes a company’s website and generates high-level business themes using OLLM.

The workflow works as follows:

  1. Read the website’s sitemap.xml
  2. Identify relevant pages (such as /services, /product, or /platform)
  3. Scrape the textual content from those pages
  4. Send the combined content to an OLLM model
  5. Generate structured themes based on the site’s messaging

This type of workflow is commonly used for:

  • Competitive analysis
  • Market positioning research
  • Automated website audits
  • Internal strategy research

The only AI component in this pipeline is the theme generation step. All other steps are standard web data extraction.

Step 1: Read the Sitemap and Identify Relevant Pages

Most websites expose a sitemap.xml file that lists all indexable pages. Instead of scraping the entire domain blindly, we first read the sitemap and extract only pages relevant to product or service messaging.

import requests
import xml.etree.ElementTree as ET

def get_relevant_urls(sitemap_url):
   response = requests.get(sitemap_url)
   root = ET.fromstring(response.content)

   urls = []
   for url in root.findall(".//{*}loc"):
       link = url.text
       if any(path in link for path in ["/services", "/product", "/platform"]):
           urls.append(link)

   return urls

This ensures we focus only on pages that describe what the company offers, rather than blog posts or legal pages.

Step 2: Scrape Page Content

Once we have the relevant URLs, we extract the visible text from each page.

from bs4 import BeautifulSoup

def scrape_page(url):
   response = requests.get(url)
   soup = BeautifulSoup(response.text, "html.parser")

   text = soup.get_text(separator=" ", strip=True)
   return " ".join(text.split())

In a production environment, you may want to remove navigation menus, footers, or repeated elements. For simplicity, this example extracts full page text.

Step 3: Generate Themes Using OLLM

Now that we have the combined website content, we send it to OLLM for analysis.

OLLM is OpenAI-compatible, so we can use the official OpenAI SDK by setting the base_url to the OLLM endpoint.

from openai import OpenAI

client = OpenAI(
   base_url="https://api.ollm.com/v1",
   api_key="your-api-key"
)

def generate_themes(content):
   response = client.chat.completions.create(
       model="near/GLM-4.6",
       messages=[
           {
               "role": "system",
               "content": "You are an analyst extracting high-level business themes from website content."
           },
           {
               "role": "user",
               "content": f"""
               Analyze the following website content and extract:

               1. Core product or service themes
               2. Target audience segments
               3. Key value propositions
               4. Repeated messaging patterns

               Website content:
               {content}
               """
           }
       ]
   )
return response.choices[0].message.content

This call sends the scraped website data to the selected model (near/GLM-4.6). The model analyzes the text and returns structured thematic insights.

Step 4: Full Workflow Example

Below is a simplified end-to-end example combining all steps.

def run_analysis():
   sitemap_url = "https://example.com/sitemap.xml"

   urls = get_relevant_urls(sitemap_url)

   combined_content = ""

   for url in urls:
       page_text = scrape_page(url)
       combined_content += "\n\n" + page_text

   themes = generate_themes(combined_content)

   print("Generated Themes:\n")
   print(themes)

run_analysis()

When executed, this script will:

  • Identify relevant product/service pages
  • Extract their textual content
  • Send the content to OLLM
  • Print the generated themes

Expected Output

The model will typically return structured insights such as:

  • Primary product categories
  • Core differentiators
  • Messaging consistency
  • Target industries or user personas

You can optionally post-process this output into JSON if you require structured downstream usage.

Production Considerations

When applying this workflow in a real system:

  • Limit or chunk large content to avoid excessive token usage
  • Validate HTTP status before reading model output
  • Record token usage (response.usage.total_tokens) for cost tracking
  • Handle network failures and timeouts gracefully
  • Avoid scraping websites that prohibit automated access

Because OLLM processes all inference inside Trusted Execution Environments (TEEs), the scraped website content is analyzed within a confidential computing boundary.

On this page