squid-prefetch
Bot User-Agent:squid-prefetch
🤖 Overview
squid-prefetch is a web crawler component of the Squid caching proxy server, an open-source software project maintained by the Squid community (squid-cache.org). Its purpose is to prefetch objects referenced in HTML pages that are likely to be requested soon, aiming to reduce latency for users behind the proxy. This feature is built into Squid and is not a standalone product; it operates as an automated agent that assists in cache warming and performance optimization.
🌐 Technical Behavior
The agent parses HTML responses to extract hyperlinks, then fetches those linked resources in advance using HTTP/1.1 GET requests with concurrent connections. Request frequency is controlled by Squid configuration directives such as prefetch_max_connections (default 10) and prefetch_rate. The bot originates from the proxy server’s public IP address, so IP ranges vary per deployment. It does not follow redirects beyond one hop and sends an Accept: */* header. The agent typically does not send cookies or Referer headers, and its default User-Agent string is Mozilla/5.0 (compatible; squid-prefetch/4.0) (version reflects the Squid release).
📋 robots.txt Compliance
By default, Squid’s prefetch feature does not automatically honor robots.txt; it is a proxy-level optimization intended for internal networks. However, Squid’s official documentation describes the prefetch_respect_robots directive (introduced in version 4.0) that enables compliance when set to on. Many system administrators enable this to avoid overwhelming external websites. The agent also respects noindex and nofollow meta tags if configured to do so via ACLs.
🔍 Detection Indicators
Key identifiers include the User-Agent string Mozilla/5.0 (compatible; squid-prefetch/4.0) (or similar version numbers like 5.0, 6.0). Additionally, the agent may send an optional X-Prefetch: yes header, though this is not standard. Behavioral fingerprints include concurrent requests to multiple URLs on the same domain within seconds, minimal HTTP headers, and no authentication. The agent does not include a From field and rarely sends Accept-Encoding.
📊 Data Usage
Data collected by the prefetch agent is stored in the Squid cache and served to subsequent client requests. It is not used for AI training, indexing, or external analytics. The cached content improves response times for users of the proxy and reduces bandwidth consumption. The feature is entirely local to the proxy deployment, meaning no data is shared with third parties.
⚙️ Rate Limiting Policy
Because prefetching can generate bursts of tens of concurrent requests per IP, webmasters often rate-limit this agent to protect server resources. Threshold-based blocking at 10 requests per second per IP is typical, as the agent is legitimate but can inadvertently cause load spikes when misconfigured or when prefetch is enabled on a large number of linked pages.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.