Book a Meeting

Technical Expertise
in AI Visibility

How ChatGPT Sees the Web

ChatGPT uses over 13 tools to interact with the web, but discoverability hinges on two key pipelines: web.run via Open and Search_Query.

The "Open" Path

Open retrieves a post-processed, clean-text version of your site. It strips JSON-LD and HTML tags, meaning ChatGPT never sees the full page—only excerpts relevant to the query.

Note: It does not execute JavaScript. CSR-only apps appear blank to the agent.

The "Search_Query" Path

A separate system resembling a Bing API indexer. It does not hit your server directly or share cache with Open.

  • q: User query (e.g. "best shoes")
  • recency: Timeframe (e.g. 1 day for news)
  • domains: Source filtering
Why It Matters

Bing Indexing: We've confirmed the bot uses Bing's index. If you're not on Bing, you're invisible to this tool.

JSON-LD Criticality: While Open ignores it, Bing relies on it. 88% of recommended sites have robust JSON-LD, helping Bing construct the snippets ChatGPT retrieves.

Process Architecture

graph TD %% Styles classDef default fill:#111,stroke:#333,stroke-width:1px,color:#e5e5e5,rx:5,ry:5; classDef blue fill:#111,stroke:#1E65F8,stroke-width:2px,color:#1E65F8; classDef orange fill:#111,stroke:#FF9800,stroke-width:2px,color:#FF9800; classDef green fill:#111,stroke:#4CAF50,stroke-width:2px,color:#4CAF50; classDef user fill:#2D3748,stroke:#4A5568,color:#fff; classDef router fill:#10a37f,stroke:#fff,color:#fff,shape:rhombus; classDef noServer fill:#2B0E0E,stroke:#F56565,color:#FC8181; User("User: Makes a query related to your product"):::user --> Router{web.run}:::router subgraph OpenPath [Open Path] direction TB Router -- open --> InputOpen["INPUT (JSON) ref_id: https://your-site.com"]:::orange InputOpen --> InternalCache{Internal Cache}:::orange InternalCache -.->|Yes| Hit[HIT]:::orange InternalCache -->|No| Miss[MISS]:::green Miss --> Server[YOUR SERVER]:::default Server --> Processing[PRE-PROCESSING]:::default Processing --> OutputOpen["OUTPUT TO CHATGPT"]:::orange Hit -.-> OutputOpen end subgraph SearchPath [Search Path] direction TB Router -- search_query --> InputSearch["INPUT (JSON) q: 'Product Query' recency: 7"]:::blue InputSearch --> BingCache(BING CACHE):::blue BingCache -.->|Cached| OutputSearch["OUTPUT TO CHATGPT title: 'Product Title...' snippet: 'Summary...'"]:::blue BingCache -- No Connection --> NoServer[NO CONNECTION]:::noServer end linkStyle default stroke:#718096,stroke-width:2px;

The 13 Tools of ChatGPT -> Web.run

We leverage advanced techniques to extract the tools and system prompts ChatGPT uses. Understanding its operational logic when navigating the web gives us a strategic edge in planning and engineering content. We don't just write content; we are AI engineers, engineering content.

web.run The central hub for web interaction. It orchestrates searches, page visits, and data extraction.
Parameters search_query open find ...
{
  "tool": "web.run",
  "description": "Multi-purpose web access tool used by ChatGPT...",
  "parameters": {
    "search_query": { "type": "array", "description": "Perform web searches" },
    "open": { "type": "array", "description": "Open a specific web page" },
    "find": { "type": "array", "description": "Search for text inside pages" },
    "screenshot": { "type": "array", "description": "Take PDF screenshots" },
    "image_query": { "type": "array", "description": "Search for images" },
    "product_query": { "type": "object", "description": "Search retail products" },
    "sports": { "type": "array", "description": "Sports schedules/standings" },
    "finance": { "type": "array", "description": "Stock/crypto prices" },
    "weather": { "type": "array", "description": "Weather forecasts" },
    "calculator": { "type": "array", "description": "Arithmetic expressions" },
    "time": { "type": "array", "description": "Current time info" }
  }
}
search_query Executes targeted searches on Bing, filtering by domain and recency to find fresh content.
Parameters q recency domains
{
  "search_query": {
    "type": "array",
    "items_type": "SearchQuery",
    "fields": {
      "q": { "type": "string", "description": "The search query string" },
      "recency": { "type": "integer", "description": "Recency in days" },
      "domains": { "type": "array", "description": "Limit results to domains" }
    }
  }
}
open Retrieves the raw, simplified text of a webpage. Crucially, it strips most styling and scripts.
Parameters ref_id lineno
{
  "open": {
    "type": "array",
    "items_type": "OpenToolInvocation",
    "fields": {
      "ref_id": { "type": "string", "description": "URL or reference ID to open" },
      "lineno": { "type": "integer", "description": "Viewport line hint" }
    }
  }
}
find Scans the 'open' page content for specific keywords or phrases to locate relevant sections.
Parameters ref_id pattern
{
  "find": {
    "type": "array",
    "fields": {
      "ref_id": { "type": "string", "description": "Page reference ID" },
      "pattern": { "type": "string", "description": "Text pattern to find" }
    }
  }
}
screenshot Captures visual evidence from PDF documents to verify information or layout.
Parameters ref_id pageno
{
  "screenshot": {
    "type": "array",
    "description": "Works on PDFs only",
    "fields": {
      "ref_id": { "type": "string", "description": "PDF reference ID" },
      "pageno": { "type": "integer", "description": "0-indexed page number" }
    }
  }
}
image_query Searches for relevant images across the web to enrich the conversation context.
Parameters q recency domains
{
  "image_query": {
    "type": "array",
    "fields": {
      "q": { "type": "string", "description": "Image search query" },
      "recency": { "type": "integer", "description": "Recency filter" },
      "domains": { "type": "array", "description": "Domain restriction" }
    }
  }
}
product_query A specialized index for finding products, prices, and availability in real-time.
Parameters search lookup
{
  "product_query": {
    "type": "object",
    "fields": {
      "search": { "type": "array", "description": "Free-text product queries" },
      "lookup": { "type": "array", "description": "Product IDs/names" }
    }
  }
}
sports Direct access to live sports data, scores, and team standings.
Parameters fn league team
{
  "sports": {
    "type": "array",
    "fields": {
      "fn": { "enum": ["schedule", "standings"] },
      "league": { "enum": ["nba", "nfl", "epl", "mlb", ...] },
      "team": { "type": "string", "description": "Team alias" }
    }
  }
}
finance Real-time financial data pipeline for stocks, crypto, and market indices.
Parameters ticker type
{
  "finance": {
    "type": "array",
    "fields": {
      "ticker": { "type": "string", "description": "Symbol e.g. BTC" },
      "type": { "enum": ["equity", "crypto", "index"] }
    }
  }
}
weather Hyperlocal weather forecasting and historical climate data.
Parameters location start duration
{
  "weather": {
    "type": "array",
    "fields": {
      "location": { "type": "string", "description": "City/Region" },
      "start": { "type": "string", "description": "Start date" },
      "duration": { "type": "integer", "default": 7 }
    }
  }
}
calculator A precise mathematical engine for complex calculations and data analysis.
Parameters expression
{
  "calculator": {
    "type": "array",
    "fields": {
      "expression": { "type": "string", "description": "e.g. '1 + 2'" }
    }
  }
}
time Timezone-aware clock utility for scheduling and temporal reasoning.
Parameters utc_offset
{
  "time": {
    "type": "array",
    "fields": {
      "utc_offset": { "type": "string", "description": "e.g. '+01:00'" }
    }
  }
}
click Simulates user interaction on a page. Note: This tool is marked as experimental/fragile.
Parameters ref_id id
{
  "click": {
    "type": "array",
    "description": "Follow specific links inside pages",
    "fields": {
      "ref_id": { "type": "string", "description": "Source page ID" },
      "id": { "type": "integer", "description": "Link index" }
    },
    "note": "May be fragile/error-prone."
  }
}

Context Window & Chunking

How much can it see?

The HTML left after preprocessing is "meaningful content". The effective retrieval limit sits between 11,000 - 15,000 tokens (approx. 75k characters).

The Risk

If your content exceeds this, the bot hallucinates the rest. It cannot jump to specific lines; it jumps to "chunks" it hasn't seen.

The Solution

Front-load key data. Ensure critical JSON-LD and entity definitions appear in the first 15k tokens.

Token Limit Visualizer Reading (In Context)
Active Window
0 / 15,000 Tokens

Drag to scan the document structure

The Architecture of Visibility

Schema markup is not just SEO hygiene; it is the language AI uses to understand your entity. Our research on 107,352 citations reveals a strict hierarchy of value.

Layer 3: Strategic High Risk / High Reward

Answer-Oriented Schema. Only for specific, high-value pages.

+15-20% Citation Uplift (Conditional)
FAQPage HowTo Review
Layer 2: Content-Type Vertical Unlock

Template-Specific. Defines what the content is. Correctness is the differentiator here (only 10.7% of products do this right).

76% of AI Citations use this
Article Product Event
Layer 1: Baseline Mandatory Hygiene

Site-Wide Identity. Tells the AI "Who is speaking?". Without this, you are invisible to the Knowledge Graph.

82% Presence in AI Results
Organization WebSite BreadcrumbList
⚠️ The FAQ Schema Trap

Warning: Do not put FAQ Schema everywhere. Google deprecated generic FAQ rich results in Aug 2023.

BAD PATTERN (NOISE)

FAQ on every page.
Repetitive Q&A.
Risk: Spam Penalty.

GOOD PATTERN (SIGNAL)

Dedicated FAQ Page.
High-intent objections.
Result: +20% Visibility.

The Logic of Discovery

graph LR %% Styles classDef default fill:#111,stroke:#333,stroke-width:1px,color:#e5e5e5,rx:5,ry:5; classDef blue fill:#111,stroke:#1E65F8,stroke-width:2px,color:#1E65F8; classDef green fill:#111,stroke:#6ee7b7,stroke-width:2px,color:#6ee7b7; classDef red fill:#2B0E0E,stroke:#F56565,color:#FC8181; Content[Raw Content] --> check{Has Schema?} check -- No --> Unstructured[Unstructured HTML]:::default check -- Yes --> JSON[JSON-LD Object]:::blue Unstructured -.-> Ambiguity[Ambiguity]:::red Ambiguity --> Ignore[AI Ignored]:::red JSON --> Entity[Entity Recognition]:::blue Entity --> Graph{Knowledge Graph}:::green Graph -- Low Trust --> Candidates[Candidate Pool]:::default Graph -- High Trust --> AIAnswer[AI Answer Citation]:::green linkStyle default stroke:#718096,stroke-width:2px;

GEO Audit Executive Summary

Generated on: November 27, 2025 at 13:21

A market-wide analysis of AI visibility readiness. We ran an analysis across 1K sites and audited over 10 different technical data points.

🛡️ Bot Access & Market Accessibility

83.8% of sites were successfully audited. However, 8.2% (61 sites) blocked our crawler (Bot Detection/WAF). This indicates a segment of the market is actively protecting their data.

Why it matters: If a site blocks bots (like ChatGPT or Search Engines), it cannot be indexed by AI Search Engines (GEO). "BLOCKED" means our audit tool was rejected by their firewall.
📊 Technical SEO Maturity

Companies excel at Title (96.6% pass rate) but struggle significantly with Favicon (19.3% pass rate). Improving Favicon represents the biggest opportunity for competitive advantage.

Why it matters: This radar chart shows the "Technical Health" of the market. A full shape means perfect optimization. Dips indicate widespread weaknesses (e.g., no one uses structured data).
🏢 Company Size vs. GEO Score

101–200 Employees companies have the highest Technical SEO maturity (Avg Score: 62.5/100). In contrast, 201–500 Employees companies lag behind at 57.2/100.

How Score is Calculated: We take the average pass rate of all 10 technical checks (Title, Meta, Canonical, JSON-LD, etc.) for every company. A score of 100 means a perfect technical setup.
🔍 Rich Results Strategy

The most adopted Schema type is WebSite (used by 234 sites). Rich results strategies are heavily focused on this type.

Why it matters: "Schema" (JSON-LD) is the code that helps AI understand context (e.g., "This is a Job Posting"). The most popular types show where competitors are focusing their efforts.

Detailed Visualizations

Advanced AI Visibility Technology.
Built on cutting-edge
infrastructure and expertise.

LLM Tracking
AEO Platform
Analytics Engine
API Integration
Custom Solutions
Technical Capabilities

Our platform leverages state-of-the-art machine learning models, real-time data processing, and advanced analytics to deliver unprecedented visibility into AI recommendation systems.

Real-time LLM response monitoring across major platforms
Advanced natural language processing for citation analysis
Automated content optimization recommendations
API-first architecture for seamless integrations

Our Technical Services

Infrastructure
Scalable Cloud
Architecture
Real-time Data
Processing
High Availability
Systems
Enterprise Security
Standards
AI & Machine Learning
LLM Response
Analysis
NLP-powered
Content Audit
Predictive Citation
Modeling
Automated Pattern
Recognition
Analytics & Insights
Custom Dashboards
& Reports
Competitive
Benchmarking
Trend Analysis
& Forecasting
Attribution
Modeling
Integration & APIs
RESTful API
Access
Webhook
Support
Third-party
Integrations
Custom
Development

Ready to leverage
our expertise?

Book a Meeting