exooba

Bot User-Agent: exooba

🤖 Overview

Exooba is a web crawler operated by the Exooba company, based in the United States, designed to collect publicly accessible web content for the purpose of building a knowledge graph and AI-powered search index. The bot was first observed in early 2024 and is primarily used to feed data into Exooba’s proprietary knowledgebase platform, which supports enterprise search, natural language query answering, and AI training datasets.

🌐 Technical Behavior

The crawler performs HTTP GET requests at a moderate frequency, typically between 1 and 3 requests per second per source IP, and respects standard HTTP headers such as If-Modified-Since and ETag to avoid redundant downloads. Exooba operates from a pool of IP addresses belonging to AWS (Amazon Web Services) and Google Cloud Platform, with ranges often associated with US-based data centers. The bot uses both IPv4 and IPv6 addresses and employs a rotating user-agent pattern to avoid simple IP-based blocking, but does not attempt to spoof its identity. According to official documentation published on Exooba’s website, the crawler follows a breadth-first crawl strategy, starting from a seed list of high-authority domains and expanding outward via hyperlinks. It does not follow nofollow links or submit forms, and it ignores JavaScript-rendered content unless a static HTML fallback is available.

📋 robots.txt Compliance

Exooba officially states that it fully honors robots.txt directives, including Disallow rules, and will cease crawling any path or directory explicitly blocked. No documented cases of non-compliance have been reported in security advisories or webmaster forums, and the bot’s GitHub repository (github.com/exooba/crawler-policy) includes a reference implementation of the robots.txt parser used. The bot also respects Crawl-Delay directives if specified in the robots.txt file, as confirmed by Exooba’s published technical whitepaper.

🔍 Detection Indicators

The primary User-Agent string reported in server logs is Mozilla/5.0 (compatible; Exooba/1.0; +https://exooba.com/bot), though variants with different version numbers have been observed. Additional identifying markers include a custom HTTP header X-Exooba-Crawler: 1 and a reverse DNS name ending in .crawl.exooba.com. Behavioral signatures include a consistent request interval of 1–3 seconds and the use of Accept: text/html,application/xhtml+xml without other MIME types.

📊 Data Usage

Collected data is used to build a structured knowledge graph that powers Exooba’s enterprise search and AI query-answering platform. The company states in its privacy policy that extracted text, metadata, and link relationships are stored and may be used to train machine learning models for natural language processing tasks. Exooba does not retain personally identifiable information (PII) and strips sensitive data fields as part of its processing pipeline, as outlined in their data handling documentation.

⚙️ Rate Limiting Policy

Rate limiting is recommended because Exooba can generate sustained request volumes that may degrade server performance, especially on smaller websites. A threshold of 100 requests per minute per IP is a common best practice to block or throttle the crawler when it exceeds normal human browsing patterns, ensuring server resources remain available for human users. The policy rationale is based on Exooba’s own recommended crawl rate of 1–3 requests per second, and any deviation above that may indicate misconfiguration or aggressive behavior.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.