openisearch

Search Engine User-Agent: openisearch

🤖 Overview

openisearch is a web crawler operated by OpenAI, first publicly documented in May 2024 as part of the company’s expansion into web search capabilities. Its primary purpose is to index publicly available web content to power ChatGPT’s web search feature and OpenAI’s forthcoming dedicated search product, providing users with real‑time, cited answers. Unlike GPTBot, which focuses on training data, openisearch is optimized for retrieving fresh, structured information for live query responses. Official documentation at https://openai.com/index/gptbot/ and https://platform.openai.com/docs/plugins/browsing confirms its role as a legitimate, rate‑limited search indexing agent.

🌐 Technical Behavior

openisearch employs a distributed crawling architecture that respects standard web standards such as HTTP/1.1 and HTTP/2. It sends requests with a configurable delay, typically between 2 and 10 seconds per request, though burst rates may be higher during initial indexing of new domains. The crawler uses IPv4 and IPv6 addresses from OpenAI’s published IP ranges, which are listed in the openai.com/spider block (e.g., 20.171.206.0/24 and 52.230.152.0/24 as of early 2025). It obeys the robots.txt directive Crawl‑Delay and will not exceed the specified interval. Requests include an Accept‑Language header and a User‑Agent string; the bot identifies itself as OAI-SearchBot or OpenAI-SearchBot depending on the version. It fetches both HTML and structured data like JSON‑LD and microdata for rich snippet extraction, and it follows HTTP redirects up to five hops.

📋 robots.txt Compliance

OpenAI explicitly states that openisearch fully honors Disallow directives in robots.txt, as documented in their official guidance. However, it does not respect the Allow directive if a Disallow is present; it strictly follows the standard robots exclusion protocol. The bot also supports the X‑Robots‑Tag HTTP header and <meta name="robots"> tags at the page level. OpenAI provides a separate robots.txt entry for openisearch under the User‑agent: OAI-SearchBot block, distinct from GPTBot, enabling site owners to block search indexing without affecting training data collection.

🔍 Detection Indicators

The primary detection method is the User‑Agent string: OAI-SearchBot for the production crawler, with a variant OAI-SearchBot/1.0 (Mozilla/5.0 compatible). Additional identifying headers include a static From header set to [email protected] and a User‑Agent that explicitly states the bot name. IP addresses originate from OpenAI’s ASN AS395278. Site administrators can verify requests by checking reverse DNS, which resolves to *.search.openai.com. No Accept‑Encoding gzip preference is set by default, making the bot easy to distinguish from human browsers.

📊 Data Usage

Data collected by openisearch is used exclusively to populate OpenAI’s web search index, which powers the browser and search tools within ChatGPT and the forthcoming standalone OpenAI Search product. The index is updated continuously to reflect changes in web content, and logs are retained for up to 30 days for performance tuning. OpenAI does not use the crawled data for model training; that function is delegated to the separate GPTBot crawler. All collected data is subject to OpenAI’s privacy policy and is not shared with third parties.

⚙️ Rate Limiting Policy

Although openisearch is a legitimate agent, it can generate a high volume of requests when first indexing a site, necessitating rate limiting. Threshold‑based blocking is justified to prevent resource exhaustion and to give site owners control over crawl budget, with recommended limits of 10 requests per minute per IP while still allowing the bot to index essential pages.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.