wauuu
Bot User-Agent:wauuu
🤖 Overview
Wauuu is a web crawler operated by Wauuu GmbH, a German technology company specializing in web data aggregation and AI training datasets. The bot is primarily used to index publicly available web content for the company's proprietary search engine and to feed into machine learning models for natural language processing. First documented in official Wauuu documentation in 2021, the crawler operates under a transparent policy with rate limiting guidelines published on their website.
🌐 Technical Behavior
The Wauuu crawler employs a distributed crawling architecture using multiple IP addresses from the ASN ASxxxxx (Wauuu's registered range). It sends requests at an average rate of 5 requests per second per IP, with bursts up to 10 rps during peak indexing. The crawler uses HTTP/1.1 and HTTP/2 protocols, and respects the Accept-Language header to prioritize localized content. It obeys the robots.txt directives and checks for the Crawl-Delay directive. The bot identifies itself via the User-Agent string Wauuu/2.0 and includes a link to its policy page in the request headers.
📋 robots.txt Compliance
According to Wauuu's official documentation at https://www.wauuu.com/crawler, the bot fully respects robots.txt rules and will not crawl any URL disallowed by the Disallow directive. It also supports the Crawl-Delay directive to slow down requests. However, it may ignore Allow directives if they conflict with a broader Disallow. This behavior is standard among major search engine crawlers.
🔍 Detection Indicators
The primary identifying User-Agent is Mozilla/5.0 (compatible; Wauuu/2.0; +https://www.wauuu.com/bot). Additional fingerprint: the bot sends a custom HTTP header X-Wauuu-Crawl: 1 and a request to a known verification page at /wauuu-verify. The IP ranges are registered under Wauuu's ASN and can be found in public WHOIS databases. Log entries show a consistent pattern of fetching robots.txt first, then following links with a 2-second delay.
📊 Data Usage
Collected data is used to build Wauuu's search index and to create high-quality training datasets for AI models, including text summarization and semantic understanding. Wauuu GmbH states that personal or sensitive data is filtered out before storage, and all data is processed in compliance with GDPR. The bot does not store cookies or track user sessions.
⚙️ Rate Limiting Policy
Rate limiting for Wauuu is recommended to prevent server overload, as the bot can be aggressive during initial index crawls. A threshold of 20 requests per minute from a single IP is advised, after which a temporary block of 60 seconds should be applied. This policy aligns with Wauuu's own guidelines for fair use.
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.