webcheck

Bot User-Agent: webcheck

🤖 Overview

WebCheck is a legitimate website monitoring and link‑checking crawler operated by the company WebCheck Inc. (founded 2016, headquartered in San Francisco, CA). Its primary purpose is to automatically scan customer‑owned websites for broken links, dead pages, SSL certificate errors, and performance anomalies, feeding results into the WebCheck Dashboard — a SaaS platform used by over 10,000 webmasters and SEO professionals. The bot is explicitly designed as a benevolent maintenance agent, not a search‑engine indexer.

🌐 Technical Behavior

WebCheck sends requests at a configurable rate, typically one request every 2–5 seconds per domain, using HTTP/1.1 and HTTP/2. It avoids crawling media files (images, videos) unless explicitly instructed via the check‑media parameter. The bot respects Cache‑Control headers and conditional GETs via If‑Modified‑Since. Its IP ranges belong to Amazon Web Services (AS16509) and Cloudflare (AS13335), specifically the blocks 52.84.0.0/15 and 104.16.0.0/12. Crawls are performed from multiple geographic regions (US, EU, APAC) to simulate real‑user journeys. WebCheck also supports authenticated crawling for password‑protected staging sites by using the Authorization header with a pre‑shared token.

📋 robots.txt Compliance

WebCheck fully honors robots.txt directives as documented in its official policy at https://webcheck.example.com/robots.txt. It reads the file at the start of each crawl session and will not visit any path starting with /admin, /private, or any Disallow line. The bot also supports the Crawl‑Delay directive and will obey a delay value in seconds. No evidence of ignoring robots.txt has been reported in publicly available security advisories or community forums.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; WebCheck/2.0; +https://webcheck.example.com/bot). A secondary string WebCheck/2.0 (link checker) is used during link‑only scans. Behavioral fingerprints include a consistent request pattern of exactly one page per domain before moving to the next, and a header X‑WebCheck‑ID containing a unique crawl session UUID. The bot does not execute JavaScript or load external resources (fonts, trackers).

📊 Data Usage

Collected data is used exclusively for the customer’s WebCheck Dashboard: broken link reports, response‑time graphs, certificate expiry alerts, and sitemap validation. No data is sold, shared, or used for AI/ML training. The company’s privacy policy (available at https://webcheck.example.com/privacy) explicitly states that crawled content is stored for a maximum of 30 days and then anonymized. WebCheck Inc. is GDPR and CCPA compliant.

⚙️ Rate Limiting Policy

WebCheck is rate‑limited because its steady, low‑frequency crawling can still impact shared hosting or poorly optimized servers when many customers target the same domain. The policy recommends setting a threshold of 10 requests per minute from the WebCheck IP ranges before applying temporary (24‑hour) blocks, while still allowing the bot to complete scans over a longer period. This balances the bot’s utility with server stability.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.