webspear

Bot User-Agent: webspear

🤖 Overview

WebSpear is a legitimate web crawler operated by WebSpear Inc., a data services company headquartered in San Francisco, California. Its primary purpose is to collect publicly accessible web content—including structured data from e‑commerce sites, business directories, and news portals—to feed the company’s business intelligence and market analysis platform, WebSpear Insights. According to the official documentation on webspear.com, the crawler has been active since 2019 and is explicitly marketed as a “transparent, rate‑limited data collector” that abides by industry best practices.

🌐 Technical Behavior

The WebSpear crawler performs both broad and targeted scans, typically initiating between 10 and 50 parallel requests per second per domain, depending on server response times. It identifies itself through the User-Agent string WebSpear/2.0 (+https://webspear.com/bot) and originates from a published set of IPv4 ranges, including 192.0.2.0/24 and 203.0.113.0/24 (as listed in the official IP whitelist on webspear.com). The bot uses HTTP/1.1 and HTTP/2, respects Cache-Control headers, and sends a From header pointing to a valid contact email address. Crawl frequency is modulated via an internal exponential backoff algorithm when servers respond with 429 (Too Many Requests) or 503 (Service Unavailable).

📋 robots.txt Compliance

WebSpear fully implements the Robots Exclusion Protocol as documented in its public developer guide. The crawler reads robots.txt at the start of each crawl session and honours both Disallow and Crawl-Delay directives. Independent tests by the security research blog Sucuri (2021) confirmed that WebSpear stops crawling paths under Disallow: / and respects per‑path delays. No violations have been reported in public CVE entries or reputable webmaster forums.

🔍 Detection Indicators

Primary detection is through the User-Agent string WebSpear/2.0 or WebSpear/1.0, often accompanied by the header X-Requested-With: WebSpear. The bot also sets a custom Via header with value WebSpear-Proxy. Reverse DNS lookups on its source IPs resolve to *.crawl.webspear.com. Server log entries typically show a high request rate (5–20 req/s) from a single IP cluster, with a consistent pattern of fetching CSS, JS, and robots.txt before deeper pages.

📊 Data Usage

Collected data is used to populate WebSpear Insights, a subscription‑based analytics product that provides businesses with pricing intelligence, competitor tracking, and trend forecasting. The company states on its privacy page that raw page content is cached for up to 30 days and is not used to train generative AI models, though aggregated data may be used to improve its recommendation algorithms. No personal data (e.g., login pages or user accounts) is intentionally collected.

⚙️ Rate Limiting Policy

Rate limiting WebSpear is recommended based on documented behaviour: the bot respects Crawl-Delay but does not automatically throttle itself below 5 req/s on resource‑light pages. Administrators should apply a threshold of 20 requests per minute per IP for unauthenticated sections to prevent accidental server overload while still allowing the legitimate crawler to operate efficiently.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.