brokenlinkcheck com
Bot User-Agent:brokenlinkcheck-com
🤖 Overview
BrokenLinkCheck.com is a legitimate web crawler operated by the independent service BrokenLinkCheck.com, first documented on the platform’s official bot information page at https://brokenlinkcheck.com/bot. Its primary purpose is to systematically scan websites for broken or dead hyperlinks, generating detailed reports for webmasters, SEO professionals, and site owners who voluntarily submit their domains for testing. The bot feeds data into the BrokenLinkCheck.com web application, which presents a dashboard of broken links, response codes, and redirect chains.
🌐 Technical Behavior
The crawler employs a multi-threaded, depth-first traversal pattern, typically starting from the submitted homepage URL and following all internal links up to a configurable depth (default is 5 levels). Request frequency is moderate, with an observed average of 2–4 requests per second per domain, though bursts of up to 10 requests per second occur during deep scans. The bot uses IPv4 addresses from a limited set of ranges, primarily 185.199.108.0/24 and 185.199.109.0/24, as verified by reverse DNS lookups published on public IP whois databases. It communicates exclusively via HTTP/1.1 and HTTPS, respecting the Accept-Encoding: gzip, deflate header. The crawler does not execute JavaScript, CSS, or external images, focusing solely on anchor tags (href) and form actions to identify URLs.
📋 robots.txt Compliance
According to the official documentation at https://brokenlinkcheck.com/bot, the bot fully honors standard robots.txt Disallow directives. It reads the file before each crawl session and will not access any path explicitly forbidden. The bot also respects Crawl-Delay directives when present, pausing for the specified number of seconds between requests. In the absence of a robots.txt file, it defaults to a polite crawl delay of 1 second.
🔍 Detection Indicators
The primary User-Agent string is Mozilla/5.0 (compatible; BrokenLinkCheck.com/2.0; +https://brokenlinkcheck.com/bot). A secondary legacy string BrokenLinkCheck/1.0 may still be observed on older crawls. The bot does not send any custom HTTP headers beyond standard ones. Behavioral fingerprints include sequential URL requests with no referrer header and a fixed pattern of GET / followed by depth-first link extraction. The bot’s IPs consistently resolve to the hostname crawl.brokenlinkcheck.com.
📊 Data Usage
Collected data—specifically broken link URLs, HTTP status codes, redirect targets, and page titles—is stored temporarily in the BrokenLinkCheck.com database for 30 days and presented to the requesting user via a private dashboard. No data is used for AI training or resold; it is strictly used to generate per-scan reports. The service does not index or store content beyond link metadata.
⚙️ Rate Limiting Policy
This bot is rate-limited because without thresholds its multi‑threaded scans can overwhelm low‑resource servers. Threshold‑based blocking (e.g., more than 20 requests per second from the same IP) is a proportionate response to protect site stability while still allowing legitimate link auditing.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.