anemone
Bot User-Agent:anemone
🤖 Overview
Anemone is a research-oriented web crawler operated by the University of Washington’s Web Research Group, originally developed for academic studies on web graph structure, link analysis, and crawling efficiency. It feeds data into public research datasets and supports projects like the WebBase repository.
🌐 Technical Behavior
Anemone employs a politeness policy with a default crawl delay of 1–2 seconds between requests, though it may be configured to lower intervals during bulk academic crawls. It uses HTTP/1.1 with standard GET requests and follows redirects (up to 10 hops). Its IP ranges are typically drawn from University of Washington’s autonomous system (AS73) and may also appear from other academic institutions when the crawler is distributed. It supports both breadth-first and focused crawling strategies, and it announces itself via the User-Agent header.
📋 robots.txt Compliance
Anemone consistently honors robots.txt directives, including Disallow and Crawl-delay instructions, as documented in the crawler’s source code available on the GitHub repository (https://github.com/uw-web-research/anemone). It does not override explicit exclusions and will abort crawl paths that are disallowed.
🔍 Detection Indicators
The primary User-Agent string is Anemone/1.0 (+http://webresearch.cs.washington.edu/anemone). Additional fingerprints include a consistent lack of Accept-Encoding negotiation and a small window of concurrent connections (typically 2–4). No custom X-Forwarded-For or referrer headers are injected.
📊 Data Usage
Collected data is used exclusively for academic research — including web graph analysis, link prediction studies, and language modeling datasets that are made publicly available under permissive licenses. No commercial indexing or AI model training occurs without explicit consent.
⚙️ Rate Limiting Policy
Although Anemone is legitimate, its ability to perform large-scale crawls may overwhelm server resources; therefore it is rate-limited to protect application performance. Threshold-based blocking (e.g., >20 requests/minute from the same IP) is recommended without permanent ban.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.