searchbot

Search Engine User-Agent: searchbot

🤖 Overview

SearchBot is a legitimate web crawler operated by Exalead, a French search engine technology company now part of Dassault Systèmes. Its primary purpose is to scan and index publicly accessible web pages to feed Exalead’s enterprise and public search products, including the Exalead Search Engine and the 3DSearch platform within the 3DEXPERIENCE ecosystem. First deployed in the early 2000s, SearchBot collects textual content, metadata, and page structure to build a searchable index used by businesses and individual users for information retrieval, as documented on Exalead’s official website.

🌐 Technical Behavior

SearchBot initiates crawl sessions from IP ranges registered to Exalead, notably 193.251.0.0/16 and 195.68.0.0/16, and follows standard HTTP/HTTPS protocols using HTTP/1.1 with Keep-Alive connections. According to Exalead’s technical manual, it typically makes requests at a configurable rate of up to 10 requests per second per host under default settings, though burst rates may reach 20–30 requests per second during deep re-crawls of high-priority sites. The bot identifies itself with the User-Agent string Exalead SearchBot and often includes a From header containing [email protected]. It supports gzip and deflate compression, handles redirects, and primarily processes static HTML pages, avoiding JavaScript-heavy content unless it is server-side rendered. The crawler respects HTTP directives such as If-Modified-Since and ETags to reduce unnecessary downloads, and it schedules re-crawls based on a site’s update frequency inferred from sitemaps and changefreq tags.

📋 robots.txt Compliance

SearchBot fully honors the Robots Exclusion Protocol, including Disallow directives, Crawl-delay parameters, and Allow rules, as stated in Exalead’s official crawler documentation. The bot checks robots.txt at the start of each crawl session and caches the file for up to 24 hours, refreshing it sooner if the remote server signals a change via Last-Modified headers. Exalead explicitly advises site owners to use Disallow lines to block specific paths, and violation of these directives is extremely rare, as the crawler is designed to respect publisher preferences and comply with web standards.

🔍 Detection Indicators

The primary detection indicator is the User-Agent string Exalead SearchBot (also Exalead or ExaleadBot in older versions). Additionally, the bot includes a From header with the email [email protected] and a Via header describing proxy usage. Reverse DNS lookups on requesting IPs almost always resolve to hostnames ending in exalead.com or dasault-3ds.com. Behavioral fingerprints include consistent inter-request delays (typically 100–500 milliseconds between pages), a preference for HTML, ASP, PHP, and JSF file extensions, and a pattern of checking robots.txt first, followed by a sequential crawl order respecting the sitemap hierarchy.

📊 Data Usage

Data collected by SearchBot is used exclusively to build and update Exalead’s proprietary search indexes, which power both the public Exalead Search portal and Dassault Systèmes’ 3DSearch platform. The collected content—including page titles, body text, metadata, and anchor text—is stored, processed, and indexed for full-text search and faceted navigation. No data is sold to third parties or used for AI model training; it remains within Exalead’s infrastructure solely for search retrieval purposes, as per their privacy policy and service terms.

⚙️ Rate Limiting Policy

SearchBot is rate-limited because its default crawl rate can be aggressive, especially on smaller servers, potentially degrading site performance if left unchecked. Administrators are advised to implement rate limiting with a threshold of, for example, 20 requests per minute and to block the bot if it exceeds this rate, as Exalead provides a feedback mechanism via [email protected] to report violations and adjust crawl behavior. This policy balances the need for comprehensive indexing with the protection of origin server resources.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.