orangebot

Bot User-Agent: orangebot

🤖 Overview

OrangeBot is a web crawler operated by Orange S.A., the French multinational telecommunications corporation (formerly France Telecom), designed to index publicly accessible web content for Orange’s search engine and data analytics platforms. First documented on Orange’s developer portal (developer.orange.com), the bot gathers data to support Orange Search results, AI-driven services, and internal business intelligence. It is a legitimate, non‑malicious agent that respects webmaster controls.

🌐 Technical Behavior

OrangeBot makes standard HTTP GET requests using HTTP/1.1 and HTTPS, and its default crawl rate can be aggressive – often issuing multiple requests per second without a configured delay. Official documentation indicates that the crawler originates from IP blocks within Orange’s autonomous system (AS198100) and other European ranges, predominantly from France. The bot supports compression (gzip) and sends a consistent User-Agent header; it does not execute JavaScript or parse CSS. OrangeBot’s requests typically include Accept-Language and Accept-Encoding headers, and it respects the Crawl-delay directive in robots.txt.

📋 robots.txt Compliance

Orange S.A. explicitly states that OrangeBot adheres to the Robots Exclusion Protocol. Webmasters can block or limit the crawler by adding rules for User-agent: OrangeBot in their robots.txt file. The bot honors both Disallow and Crawl-delay directives, making it manageable through standard webmaster controls.

🔍 Detection Indicators

The primary User-Agent string is OrangeBot (case‑insensitive), sometimes appended with a version suffix like OrangeBot/1.0 or a contact URL (+http://www.orange.com/webcrawler). Behavioral fingerprints include sequential requests with low inter‑request intervals, no JavaScript or cookie support, and IP geolocation consistently pointing to Orange’s network in Europe. No custom HTTP headers are mandatory, but some requests may include a Via or X-OrangeBot header.

📊 Data Usage

Orange uses collected data primarily for indexing web pages for its search engine, improving content relevancy for Orange Search, and feeding internal data analysis pipelines. While Orange does not publicly specify AI training, the data supports the company’s machine learning models for natural language processing and recommendation algorithms, as noted in their developer documentation.

⚙️ Rate Limiting Policy

Due to its potentially fast crawling speed, OrangeBot is rate‑limited by default – a common practice to prevent server overload. Webmasters are advised to set a Crawl-delay of 10–30 seconds in robots.txt; if the bot ignores these limits or shows no delay, threshold‑based blocking (e.g., at 20 requests per second per IP) is warranted. The policy reflects the need to balance thorough data collection with preserving website performance.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.