crabbybot Bot — Detection, Blocking & Technical Analysis

crabbybot

Bot User-Agent: crabbybot

🤖 Overview

CrabbyBot is a legitimate web crawler operated by Crabby, Inc. (crabby.io), a company specializing in real-time website monitoring, uptime tracking, and change detection services. First publicly documented in 2018, its primary purpose is to periodically scan websites to detect content changes, page availability, and structural modifications, feeding data into Crabby’s monitoring dashboard used by site owners and developers. According to the official CrabbyBot documentation (crabby.io/docs/crawler), the bot is designed exclusively for non-commercial, opt-in monitoring and does not train AI models or build search indexes.

🌐 Technical Behavior

CrabbyBot performs HTTP GET requests at a default rate of one request every 30 seconds per domain, as stated in its official FAQ. Crawl patterns follow a depth-first approach, typically scanning only the root page and linked subpages up to a configurable depth limit (default: 2 levels). The bot uses IPv4 addresses drawn from the 185.199.108.0/22 range (ASN 54113, Fastly) and sporadically from Amazon AWS IP pools (e.g., 52.84.0.0/15). All requests use HTTP/2 protocol with a fixed User-Agent string. It sends a Crawl-Delay header value of 30 in its initial request, and respects the X-Robots-Tag HTTP header if present. The bot does not follow JavaScript redirects or submit forms, and it caches DNS lookups for 24 hours.

📋 robots.txt Compliance

According to Crabby’s published robots.txt guidelines (crabby.io/robots.txt-policy), the bot fully honors Disallow directives with a default crawl delay of 30 seconds unless overridden by a Crawl-Delay directive. Field testing by webmasters (reported on Stack Exchange and WebmasterWorld) confirms near-zero instances of non-compliance, with logs showing the bot rechecks robots.txt every 12 hours. A 404 response to robots.txt causes CrabbyBot to abort the crawl entirely.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; CrabbyBot/2.0; +https://crabby.io/crawler/). A secondary string CrabbyBot/1.0 (ChangeMonitor; +https://crabby.io) is used for legacy clients. Behavioral fingerprints include a fixed Accept header of text/html,application/xhtml+xml, no Accept-Encoding (i.e., no gzip/deflate), and a distinctive X-Crawler-ID header set to CrabbyMonitor. Requests always originate from a small pool of the aforementioned IP ranges and never include cookies or referrer values.

📊 Data Usage

Collected data—page title, meta description, body text (first 2000 bytes), HTTP status code, and response time—is used exclusively for change detection within the Crabby dashboard. Crabby does not sell, share, or re-purpose the data for advertising, AI training, or search indexing. Users configure alerts when specific changes occur (e.g., a 404 status or a modified price). The data is retained for 90 days then anonymized.

⚙️ Rate Limiting Policy

Because CrabbyBot can re-crawl every 30 seconds indefinitely, it may overload small or poorly configured servers. A threshold-based rate limit (e.g., block after 500 requests in 10 minutes) is a standard security practice to ensure site performance while still allowing the bot’s legitimate monitoring function.

Similar Threats

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start · Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

crabbybot

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Your Site May Be Hemorrhaging Revenue to Bots

Company

Resources

Services

Trusted

Subscribe