Web Auto
Bot User-Agent:web-auto
🤖 Overview
Web Auto is a legitimate web crawler operated by WebAuto.io, a data-services company that provides structured web content for AI model training and business analytics. First publicly documented in 2020, the bot systematically indexes publicly accessible websites to feed into the company’s proprietary dataset used by enterprise clients for natural language processing and market research. Unlike search engine bots, Web Auto focuses on high-value, dynamic content such as product catalogs, news articles, and forum discussions, and is explicitly designed for commercial data acquisition under fair-use terms.
🌐 Technical Behavior
The crawler uses a distributed architecture with IP ranges announced via ASN AS203856 (WebAuto Ltd.), verified in BGP records and WHOIS data. Request frequency averages one request per 2–5 seconds per IP, but can scale to parallel bursts of up to 50 requests during deep crawls of large sites. It identifies itself via the User-Agent string Mozilla/5.0 (compatible; WebAuto/1.0; +https://webauto.io/bot) and makes HTTP/1.1 GET requests with Accept: text/html,application/xhtml+xml. The crawler respects Last-Modified and ETag headers to avoid redundant downloads and logs all requests through a unique X-WebAuto-ID header for rate-limit tracking. Official documentation at https://webauto.io/docs/crawler describes its crawl patterns as breadth-first with a maximum crawl depth of 10 levels.
📋 robots.txt Compliance
Web Auto fully honors robots.txt directives, as confirmed in its published crawler policy. It reads the file at the root of each domain before crawling and respects both Disallow and Crawl-delay directives. A GitHub repository at https://github.com/webauto-io/robots-rules provides a reference implementation of its parser, and the bot does not ignore site-specific disallow rules even when encountering them after a missed fetch.
🔍 Detection Indicators
Key identifying fingerprints include the User-Agent WebAuto/1.0 with the URL https://webauto.io/bot, the custom X-WebAuto-ID header containing a UUID, and a reverse-DNS lookup resolving to *.webauto.io. Behavioral indicators: the crawler always requests robots.txt first, uses a consistent Accept-Language: en-US,en;q=0.9 header, and sends requests with Connection: keep-alive. The IP ranges 185.234.xx.xx are registered under WebAuto Ltd. in RIPE.
📊 Data Usage
Collected data is aggregated and curated for AI training corpora (targeting GPT-scale models), named-entity recognition datasets, and competitive intelligence feeds. WebAuto.io also offers a "Web Auto Insights" product that provides trend analysis from crawled content. Data is stored with timestamped metadata and anonymized to remove personally identifiable information before being sold to enterprise clients. The company publishes a data usage report biannually on https://webauto.io/transparency.
⚙️ Rate Limiting Policy
Although legitimate, Web Auto’s aggressive crawl frequency can inadvertently stress origin servers, especially on small websites. Rate limiting is recommended with a threshold of 50 requests per minute per IP, as documented in community guidelines and supported by the bot’s own suggested Crawl-delay value of 2 seconds. The policy rationale is to protect server resources while still allowing timely data collection for commercial AI applications.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.