PageThing.com

Bot User-Agent: pagething-com

🤖 Overview

PageThing.com is a web crawler operated by PageThing, a US-based company founded in 2019 that provides website monitoring and SEO auditing services. Its sole purpose is to automatically scan subscriber websites for technical issues like broken links, page speed metrics, and missing SEO tags, then present the findings in the PageThing dashboard. Official documentation at pagething.com states the crawler only accesses sites explicitly configured by paying users.

🌐 Technical Behavior

The crawler uses a custom Node.js HTTP client supporting both HTTP/1.1 and HTTP/2. It sends requests at a default rate of one page every 3 seconds, though users can adjust this in their account settings. The user-agent string is "PageThing.com/1.0" (version increments reported). It respects standard caching headers like Cache-Control and ETag. It does not execute JavaScript or parse stylesheets; only HTML and linked resources are fetched. IPs come from AWS, DigitalOcean, and residential proxies for geo-targeted testing. Each request includes a Referer header tied to the user's session.

📋 robots.txt Compliance

PageThing's support knowledge base confirms the crawler fully adheres to the Robots Exclusion Protocol. It reads robots.txt before each crawl and honors both Disallow and Crawl-delay directives. In their 2023 changelog they confirmed no changes that would bypass restrictions. User forum posts corroborate compliance with no reported violations.

🔍 Detection Indicators

The primary indicator is the User-Agent string "PageThing.com/1.0". The crawler also sends an "X-PageThing-Version" header (e.g., "1.0.3") and may include a "From" header with the subscriber's email. Behaviorally it uses only HTTP GET requests, maintains a consistent interval of 2-5 seconds, and does not handle cookies. These characteristics enable straightforward identification in server logs.

📊 Data Usage

Data collected by the PageThing crawler is used exclusively for the subscriber's own website monitoring and SEO analysis – including broken link reports, page speed tracking, and content change detection. Their privacy policy explicitly states no data is sold, shared with third parties, or used for AI model training. Data is stored encrypted and automatically purged after 90 days unless manually deleted earlier.

⚙️ Rate Limiting Policy

Rate limiting is recommended because multiple PageThing subscribers may simultaneously scan the same site, potentially causing performance issues. The policy rationale is that even a legitimate, well-behaved crawler can overwhelm a site when multiple instances run concurrently. Threshold-based blocking (e.g., limiting requests per second per IP) ensures site stability while still permitting legitimate crawling activity.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.