netprospector

Bot User-Agent: netprospector

🤖 Overview

NetProspector is a web intelligence crawler operated by Netcraft Ltd., a UK-based internet services company founded in 1995. According to Netcraft’s official documentation, the bot is used to perform automated surveys of web servers, SSL/TLS certificates, hosting providers, and website security posture. The data feeds into Netcraft’s public-facing Web Server Survey, security reports, and their anti-phishing and fraud detection services, including the Netcraft Toolbar and the Netcraft Cybercrime Discovery System.

🌐 Technical Behavior

NetProspector initiates connections over HTTP, HTTPS, and occasionally over IPv6. Its crawl pattern is characterized by sequential requests to a single domain, typically probing common paths such as /, /robots.txt, /favicon.ico, and /cgi-bin/ for server header analysis. The bot requests are spaced with variable delays—often 2 to 10 seconds between requests—to avoid overwhelming servers. IP addresses originate from Netcraft’s own ASNs, primarily AS29838 and AS13272, with ranges such as 212.118.224.0/19 and 185.93.0.0/19. The crawler supports HTTP/1.1 and TLS 1.2+ and inspects response headers for server software, cookies, and redirect chains. It also periodically checks sites for changes in SSL certificate validity and web application technology fingerprints.

📋 robots.txt Compliance

Netcraft officially states that NetProspector honors the Disallow directives in robots.txt. The bot reads the file at the start of each crawl session and will refrain from fetching any paths listed under User-agent: netprospector or under a generic wildcard. Evidence from Netcraft’s own support page confirms that if a site administrator wishes to exclude their site from surveys, the preferred method is to add a disallow rule for the netprospector user-agent in robots.txt.

🔍 Detection Indicators

The primary User-Agent string is netprospector/1.0 (other versions exist, e.g., netprospector/2.0). Some crawls from Netcraft also include the string Mozilla/5.0 (compatible; Netcraft Web Server Survey; +http://www.netcraft.com/survey/) but the canonical identifier is netprospector. Behaviorally, the bot does not execute JavaScript and does not request images or CSS unless they are part of a specific security probe. It often sends a From header containing an administrative contact email (e.g., [email protected]) and a User-Agent that clearly identifies itself. The lack of referrer headers and the exclusive use of GET requests are also telltale signs.

📊 Data Usage

Data collected by NetProspector is used to produce Netcraft’s market-share reports on web servers, SSL certificate deployment, and hosting providers. Additionally, the data supports Netcraft’s anti-phishing systems, where freshly crawled sites are compared against known phishing kits. The bot also helps maintain the Netcraft Toolbar’s site rating database, enabling real-time warnings for potentially dangerous websites. No personal or private user data is collected; only publicly accessible HTTP responses and DNS records are analyzed.

⚙️ Rate Limiting Policy

Although NetProspector is a legitimate survey bot, its persistent scanning across many domains can consume server resources and bandwidth. Security teams often rate-limit it to prevent resource exhaustion, with typical thresholds set at 10-15 requests per minute per source IP. This policy protects application availability while still allowing the aggregated internet survey data that benefits network security research.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.