anemone Bot — Detection, Blocking & Technical Analysis

anemone

Bot User-Agent: anemone

🤖 Overview

Anemone is a research-oriented web crawler operated by the University of Washington’s Web Research Group, originally developed for academic studies on web graph structure, link analysis, and crawling efficiency. It feeds data into public research datasets and supports projects like the WebBase repository.

🌐 Technical Behavior

Anemone employs a politeness policy with a default crawl delay of 1–2 seconds between requests, though it may be configured to lower intervals during bulk academic crawls. It uses HTTP/1.1 with standard GET requests and follows redirects (up to 10 hops). Its IP ranges are typically drawn from University of Washington’s autonomous system (AS73) and may also appear from other academic institutions when the crawler is distributed. It supports both breadth-first and focused crawling strategies, and it announces itself via the User-Agent header.

📋 robots.txt Compliance

Anemone consistently honors robots.txt directives, including Disallow and Crawl-delay instructions, as documented in the crawler’s source code available on the GitHub repository (https://github.com/uw-web-research/anemone). It does not override explicit exclusions and will abort crawl paths that are disallowed.

🔍 Detection Indicators

The primary User-Agent string is Anemone/1.0 (+http://webresearch.cs.washington.edu/anemone). Additional fingerprints include a consistent lack of Accept-Encoding negotiation and a small window of concurrent connections (typically 2–4). No custom X-Forwarded-For or referrer headers are injected.

📊 Data Usage

Collected data is used exclusively for academic research — including web graph analysis, link prediction studies, and language modeling datasets that are made publicly available under permissive licenses. No commercial indexing or AI model training occurs without explicit consent.

⚙️ Rate Limiting Policy

Although Anemone is legitimate, its ability to perform large-scale crawls may overwhelm server resources; therefore it is rate-limited to protect application performance. Threshold-based blocking (e.g., >20 requests/minute from the same IP) is recommended without permanent ban.

Similar Threats

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start · Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

anemone

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Your Site May Be Hemorrhaging Revenue to Bots

Company

Resources

Services

Trusted

Subscribe