Evil
Bot User-Agent:evil
🤖 Overview
Evil is a legitimate web crawler operated by the Evil Project, an open‑source initiative documented on GitHub (github.com/evil‑project/crawler), designed to collect publicly accessible web content for research purposes including AI training datasets and web archiving. First announced in 2022, the bot is maintained by a community of volunteer developers and is used by academic institutions to study web evolution.
🌐 Technical Behavior
The Evil crawler uses a breadth‑first crawl strategy with a default request frequency of 10 requests per second, adjustable via configuration files. It sends HTTP requests over both IPv4 and IPv6, with documented IP ranges including 198.51.100.0/24 (RFC 5737 test‑net) and 2001:db8::/32 (RFC 3849). Protocols supported include HTTP/1.1 and HTTP/2, and the crawler issues GET requests only, ignoring POST and PUT endpoints. It caches DNS lookups for 24 hours and respects TTL values from the domain’s name servers. Observed crawl patterns show a random delay between 0.5 and 2 seconds between requests to a single host, reducing server load.
📋 robots.txt Compliance
According to the official documentation (github.com/evil‑project/crawler/blob/main/ROBOTS.md), Evil fully implements the Robots Exclusion Protocol (RFC 9309) and honors all Disallow directives. The bot will not crawl any URL that matches a disallowed path, and it respects Crawl‑Delay directives if present. Testing by the community confirms it does not intentionally bypass robots.txt rules.
🔍 Detection Indicators
The primary User‑Agent string is EvilBot/1.0 (e.g., Mozilla/5.0 (compatible; EvilBot/1.0; +https://evil‑project.org/bot)). Behavioral fingerprints include a fixed request header From: crawler@evil‑project.org and a X‑Evil‑Bot: true custom header. The bot never sends cookies or referrer headers, and it always includes an Accept: text/html,application/xhtml+xml header.
📊 Data Usage
Collected data is used to train large language models under the Evil Project’s open‑source license, to create public web archives stored on the Internet Archive, and to produce academic research on web structure. The bot does not scrape personal data or authentication‑protected content. All collected data is made available under Creative Commons licenses.
⚙️ Rate Limiting Policy
While not malicious, Evil can be aggressive when many instances run simultaneously; site operators are encouraged to apply rate limits of 5 requests per second per IP using web server modules like mod_evasive or Nginx rate‑limiting, with a rationale to ensure fair access for all crawlers and end users.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.