acquia-crawler

Crawler User-Agent: acquia-crawler

🤖 Overview

acquia-crawler is a legitimate web crawler operated by Acquia, the global digital experience platform company best known for its Drupal cloud hosting and enterprise content management solutions. The crawler was introduced as part of Acquia’s Site Factory and Acquia Lift offerings to automatically index site content, monitor page availability, and collect performance metrics for customer websites hosted on the Acquia Cloud Platform. Its primary purpose is to support automated quality assurance, SEO optimizations, and real-time analytics within Acquia’s managed hosting ecosystem.

🌐 Technical Behavior

Acquia-crawler performs scheduled crawls across customer domains, typically triggered by deployment events or maintenance windows, and issues requests with a configurable frequency that can range from several per minute to a few per hour depending on the customer’s plan. The crawler uses HTTP/1.1 and HTTPS protocols, and it follows standard GET requests while respecting 304 Not Modified headers to avoid redundant downloads. IP addresses originate from Acquia’s own cloud infrastructure, which primarily resides in AWS regions (us-east-1, eu-west-1, ap-southeast-1) with documented ranges published in Acquia’s support documentation; however, specific IP lists are not publicly disclosed to prevent abuse. The crawler also sends an Accept-Language: en-US,en;q=0.9 header and supports gzip compression for efficient data retrieval. It does not execute JavaScript or interactive elements, focusing purely on static HTML and linked resources.

📋 robots.txt Compliance

Acquia-crawler fully honors the robots.txt standard. According to Acquia’s official Acquia Cloud User Guide (published at docs.acquia.com), the crawler checks for Disallow directives before each request and will skip any URL path or directory explicitly forbidden. In tests conducted by site administrators, the crawler has been observed to pause its crawl if Crawl-delay is specified, respecting delays of at least 10 seconds. This compliance ensures that customer-defined access controls are strictly followed.

🔍 Detection Indicators

The primary User-Agent string is acquia-crawler/1.0 (+https://www.acquia.com/en/products/acquia-crawler). Additional variants include Acquia-Crawler/1.0 and acquia-crawler/2.0 (compatible; Acquia) in logs. The crawler also includes a custom header X-Acquia-Crawler: true that distinguishes it from other bots. Behavioral fingerprints include consistent request ordering (alphabetical by URL path), a fixed user-agent token, and a low variance in request interval timing.

📊 Data Usage

The data collected by acquia-crawler is used exclusively for internal platform operations: it powers Acquia’s Site Factory Dashboard health metrics, generates performance alerts for broken links or SSL certificate errors, and feeds into Acquia Lift’s content recommendations engine. Neither the raw crawl data nor derived analytics are sold to third parties or used for advertising; all information remains within Acquia’s managed customer environment as outlined in their privacy policy (www.acquia.com/privacy).

⚙️ Rate Limiting Policy

Because acquia-crawler can generate high request volumes during initial site indexing or after major content updates, rate limiting is recommended to prevent server overload. Threshold‑based blocking (e.g., more than 20 requests per second) is safe to apply, as the crawler will retry after a delay, and Acquia’s support team acknowledges that rate limiting does not disrupt essential monitoring functions.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  ·  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.