unchaos

Bot User-Agent: unchaos

🤖 Overview

Unchaos is a web crawler operated by Unchaos Inc., an AI research and data company based in San Francisco, California. Its purpose is to index publicly available web content to build training datasets for large language models and machine learning systems, as documented on their official website unchaos.com and in their publicly posted crawler policy.

🌐 Technical Behavior

Unchaos uses a distributed architecture with IP addresses from AWS and Google Cloud, listed at unchaos.com/ips. The bot sends requests with default User-Agent "Unchaos/1.0" or "UnchaosBot/1.0" and a custom header X-Unchaos-Crawler: 1. It enforces a minimum 1‑second delay between requests to the same domain, respects Crawl-delay in robots.txt, and only fetches text-based formats (HTML, plain text, JSON, XML) while ignoring binary files. The crawler uses a queue-based scheduler and caches robots.txt rules for up to 24 hours, per their technical documentation.

📋 robots.txt Compliance

Unchaos fully supports the Robots Exclusion Protocol and honors all Disallow directives. Their documentation at unchaos.com/robots confirms no known violations; they also provide an opt-out form at unchaos.com/opt-out for site owners wishing to block the bot without editing robots.txt.

🔍 Detection Indicators

Primary indicators are the User-Agent strings "Unchaos/1.0" and "UnchaosBot/1.0" plus the header X-Unchaos-Crawler: 1. IP ranges include 3.80.0.0/12 (AWS) and 34.64.0.0/10 (Google Cloud). Behavioral fingerprints: the bot always fetches /robots.txt first, then spreads requests across sitemap URLs.

📊 Data Usage

Collected data is used solely for training and improving Unchaos’s proprietary AI models, including their flagship language model. The company states they anonymize datasets and do not store personally identifiable information beyond provenance needs. An annual transparency report is published at unchaos.com/transparency.

⚙️ Rate Limiting Policy

Unchaos is rate-limited because even polite crawling can burden smaller sites. Recommended policy: set a conservative rate limit (e.g., 10 requests per minute) and implement threshold-based blocking to protect server resources while allowing indexing when load permits.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  ·  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.