cacheblaster

Bot User-Agent: cacheblaster

🤖 Overview

Cacheblaster is a legitimate web crawler operated by the eponymous company CacheBlaster Inc. (cacheblaster.com), a provider of content delivery network (CDN) cache warming and pre‑fetching services. Its primary purpose is to systematically request URLs on a target web application immediately after a cache flush or deployment update, ensuring that the CDN’s edge servers hold a hot copy of every page before real users arrive. The bot feeds pre‑populated cache data into the CDN infrastructure (such as Cloudflare, Akamai, Fastly, or Amazon CloudFront) to eliminate cold‑start latency and reduce origin server load during traffic spikes.

🌐 Technical Behavior

Cacheblaster operates using a configurable crawl pattern that typically mirrors the site’s sitemap or a user‑supplied URL list, crawling all pages in a deterministic order (e.g., from smallest to largest by byte size). According to the official CacheBlaster documentation (cacheblaster.com/docs), the bot sends requests at a default rate of 10–20 requests per second per IP, but this can be throttled by the customer. The crawler uses IPv4 and IPv6 addresses drawn from a dedicated residential proxy pool sourced from major ISPs (e.g., Amazon AWS, DigitalOcean, and Lumen) to avoid triggering reverse‑proxy rate limits as a single‑origin bot. All requests are made over HTTPS with HTTP/2 protocol, and the bot does not execute JavaScript, only fetching raw HTML, CSS, and static assets listed in the sitemap.

📋 robots.txt Compliance

The CacheBlaster documentation explicitly states that the bot honors robots.txt Disallow directives by default, but it can be configured to ignore them if the customer provides an override. Verified testing by the security community (e.g., a 2023 analysis on GitHub Gist “cacheblaster-robots”) confirms that the crawler respects Crawl‑delay directives when present. CacheBlaster recommends site owners use User‑Agent: Cacheblaster and Disallow: /private/ to exclude sensitive paths.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; Cacheblaster/2.0; +https://cacheblaster.com/bot), which appears in every request header. Additional identifying headers include X‑Cache‑Warming: true and Cache‑Control: no‑cache in the request. The bot’s IP ranges are not officially published, but community logs show requests originating from ASN 14618 (Amazon AWS), ASN 14061 (DigitalOcean), and ASN 3549 (Lumen).

📊 Data Usage

Collected data is used exclusively for cache warming: URLs and their response payloads (HTML, CSS, images) are passed to the CDN’s edge servers to populate the cache. CacheBlaster does not store or index the content for any tertiary purpose such as AI training or analytics; the data is transient and discarded after the cache TTL expires.

⚙️ Rate Limiting Policy

Though legitimate, Cacheblaster is rate‑limited by web application firewalls and origin servers because its sustained high‑throughput requests can mimic a denial‑of‑service pattern, especially on shared hosting. A threshold‑based blocking policy (e.g., 100 requests per second per IP from a single user‑agent) is recommended to protect application resources while still allowing the bursty cache‑warming traffic that occurs after a deployment.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.