sitechecker.pro

Monitor User-Agent: sitechecker-pro

🤖 Overview

sitechecker.pro is a legitimate web crawler operated by Sitechecker Inc., a SaaS company based in the United States that provides website auditing, SEO analysis, and technical monitoring services. The bot is primarily used to scan websites for broken links, duplicate content, page speed issues, meta tag problems, and other SEO health metrics, feeding data directly into the Sitechecker dashboard for site owners and SEO professionals. According to the official Sitechecker documentation and public user-agent lists, the crawler is not used for AI training or content aggregation but exclusively for on-demand and scheduled site audits.

🌐 Technical Behavior

The Sitechecker crawler typically initiates requests at a controlled rate of 1–2 requests per second per domain, though this can vary depending on the audit plan. It uses HTTP/1.1 and HTTPS, and it follows redirects (up to 5 hops) while parsing HTML, CSS, and JavaScript to evaluate page structure. The bot does not crawl binary files like images or PDFs unless explicitly configured to check image alt attributes. Public IP ranges are not statically published, but traffic originates from AWS and Google Cloud data centers, with IPs often belonging to 34.x.x.x or 35.x.x.x blocks. The crawler respects Cache-Control and Last-Modified headers to avoid redundant fetches.

📋 robots.txt Compliance

According to the official Sitechecker FAQ (sitechecker.pro/documentation/crawler-robots-txt), the crawler fully honors robots.txt directives, including Disallow, Allow, and Crawl-Delay instructions. It also respects the X-Robots-Tag HTTP header for page-level blocking. If a site blocks the user-agent, the crawler will skip those pages entirely, even if the block is unintentional.

🔍 Detection Indicators

The primary User-Agent string for the sitechecker.pro crawler is Mozilla/5.0 (compatible; SiteChecker/1.0; +https://sitechecker.pro/robot). Some variations include SiteCheckerBot/1.0 or Sitecheckerpro/2.0. The crawler also sends a From header with the email address of the account owner (if provided). Behaviorally, it usually requests /robots.txt first, then sequentially crawls pages from the sitemap or seed URLs, with a Referer header pointing to the audit dashboard.

📊 Data Usage

Collected data is used exclusively for website auditing: it identifies technical SEO issues, broken links, page load times, title/description duplication, and schema validation. Results are displayed in the Sitechecker dashboard and can be exported as PDF or CSV reports. The data is never sold or used to train AI models, per the company’s privacy policy (sitechecker.pro/privacy). The crawler does not index content for public search or any third-party service.

⚙️ Rate Limiting Policy

Although the Sitechecker crawler is non‑malicious and respects site controls, many webmasters rate-limit it because it can produce significant traffic during full-site audits (e.g., 10,000 pages in a few hours). Rate limiting is recommended to prevent server overload, especially on shared hosting, and threshold‑based blocking (e.g., >50 requests per minute from the SiteChecker user‑agent) ensures the site remains responsive for human visitors while still allowing the audit to complete over time.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.