netcraft

Bot User-Agent: netcraft

🤖 Overview

Netcraft is a bot operated by Netcraft Ltd, a UK-based internet security company founded in 1995. Its primary purpose is to conduct web server surveys, collect site availability metrics, and identify phishing and malicious websites for their Netcraft Anti-Phishing and Web Server Survey products. The crawler systematically probes millions of domains each month to compile the widely cited Netcraft Web Server Survey, which tracks market share of Apache, Nginx, IIS, and other server software. According to Netcraft's official documentation, the bot also supports their Site Report tool and Phishing Site Takedown service.

🌐 Technical Behavior

The Netcraft crawler uses a distributed architecture with multiple IP addresses drawn from Netcraft's own ASN (AS15511) and other leased blocks, including ranges such as 195.12.48.0/20 and 82.98.86.0/24. It requests pages using HTTP/1.1 and HTTPS, typically at a rate of one request every few seconds per IP, but can scale up to hundreds of concurrent connections across different source IPs. The crawler always follows redirects and fetches both the page content and response headers to analyze server software and security configurations. It does not execute JavaScript or render client-side content, focusing solely on server-rendered HTML and headers. The bot respects the Cache-Control header and uses conditional GET requests with If-Modified-Since to avoid unnecessary bandwidth consumption.

📋 robots.txt Compliance

Netcraft’s bot fully honors robots.txt directives, as stated in their official User-Agent documentation. They explicitly require that their crawler be allowed access for the survey, but if a site blocks via Disallow, the bot will cease crawling that path. Historical evidence from server logs shows consistent compliance, with no recorded incidents of ignoring robots.txt rules. Netcraft recommends using User-agent: NetcraftSurveyAgent or User-agent: Netcraft in robots.txt to control access.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; [email protected]). There is also a legacy string: Netcraft Web Server Survey. The bot always includes a From: [email protected] header (when configured) and a Referer: http://www.netcraft.com/ header. Behavioral fingerprint: requests come from a set of known IP ranges, always send a Host header exactly matching the target domain, and never include Accept-Language or cookies.

📊 Data Usage

Collected data is used for two main purposes: (1) The Netcraft Web Server Survey, a monthly public report that estimates the usage share of different web servers across all internet-facing domains; and (2) Netcraft Anti-Phishing, which uses crawling results to detect fraudulent websites and notify hosting providers. Additionally, raw server headers and response times feed into their Site Report and Security Rating tools, helping site owners assess SSL/TLS configurations and software versions.

⚙️ Rate Limiting Policy

Netcraft's crawler is rate-limited because it performs systematic surveys across millions of domains, which can generate significant traffic to any single site during a survey cycle. Sites with limited resources may experience load spikes; therefore, a threshold-based block (e.g., more than 10 requests per second from a single IP) is a reasonable defensive measure to protect server stability without undermining the bot's legitimate survey function.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.