OpenAI
Bot User-Agent:openai
🤖 Overview
OpenAI operates two legitimate web crawlers: GPTBot (announced August 7, 2023) and OAI-SearchBot (launched August 21, 2024). GPTBot collects publicly accessible web content to train and improve OpenAI’s generative AI models, including GPT-4, GPT-4 Turbo, and the upcoming GPT-5, as detailed on platform.openai.com/docs/gptbot. OAI-SearchBot indexes pages specifically to power search features within ChatGPT and other OpenAI products, enabling real-time information retrieval. Both agents are non‑malicious, rate‑limited, and designed to respect webmaster preferences.
🌐 Technical Behavior
GPTBot uses a custom HTTP client with a User‑Agent string of “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible with GPTBot/1.0; +https://openai.com/robot”. It requests at a moderate rate of approximately 1 request every 2–10 seconds per host, as observed by site operators. OAI‑SearchBot identifies as “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible with OAI‑SearchBot/1.0; +https://openai.com/searchbot”. Both crawlers originate from OpenAI’s published IP ranges, which include 20.115.0.0/16, 23.98.0.0/16, 40.126.0.0/16, and 152.195.0.0/16 (full list at openai.com/robot). Crawl requests use standard HTTP/1.1 and HTTPS, support gzip compression, and do not execute JavaScript. The crawlers follow href links only; they ignore