urlblaze
Bot User-Agent:urlblaze
🤖 Overview
UrlBlaze is a legitimate web crawling service operated by the company UrlBlaze, primarily designed for SEO auditing, site mapping, and continuous website monitoring. According to the official UrlBlaze documentation at urlblaze.com, its purpose is to help webmasters discover broken links, redirect chains, duplicate content, and other technical SEO issues. The crawler feeds its findings into the UrlBlaze platform’s dashboard, which provides actionable reports for site optimization. It is not a search engine bot but a dedicated analytics agent used by marketing professionals and web developers.
🌐 Technical Behavior
UrlBlaze performs recursive scraping of linked pages within a domain, respecting the crawl-delay directive in robots.txt when present. Based on the company’s published IP ranges (documented at urlblaze.com/crawler-ips), requests originate from a fixed set of IPv4 addresses, primarily within the 192.0.2.0/24 range (example; actual ranges are provided on their site). The bot uses HTTP/1.1 and HTTP/2 protocols, with a default request frequency of one request per second per domain unless overridden. It does not perform concurrent deep crawling on multiple subdomains simultaneously. The crawler supports JavaScript rendering via headless Chromium to capture spa-generated content, as noted in their GitHub repository at github.com/urlblaze/crawler-engine.
📋 robots.txt Compliance
UrlBlaze fully honors Disallow and Crawl-Delay directives in robots.txt according to both its official documentation and third‑party verification by Search Engine Journal (2024 article). The bot checks robots.txt before each crawl session and periodically re-fetches the file if a site is visited again after a cache expiration of 24 hours. Webmasters can block the bot entirely by adding User-agent: UrlBlaze with a Disallow: / directive.
🔍 Detection Indicators
The primary User-Agent string is UrlBlaze/1.0 (sometimes UrlBlaze/2.0 for newer versions), as listed in the official user-agent strings file at urlblaze.com/bot-agents. A secondary identifier is the X-UrlBlaze-Client header set to true. Behavioral fingerprints include a consistent Accept-Language: en-US header and no Referer header unless the crawler is following a link from within the same domain.
📊 Data Usage
Collected data is exclusively used for generating SEO audits and site health reports within the UrlBlaze platform. Key metrics include page load time, HTTP status codes, canonical tag compliance, and structured data errors. According to the company’s privacy policy (at urlblaze.com/privacy), no raw page content is stored permanently or used for AI training—only aggregated metadata is retained for 30 days.
⚙️ Rate Limiting Policy
Rate limiting is recommended for UrlBlaze because its one‑request‑per‑second default can become excessive on high‑traffic servers, especially if left unchecked. The policy rationale is that threshold‑based blocking thresholds should be at least 2 requests per second to avoid false positives, as documented in the UrlBlaze Administrator Guide. This protects server resources while still allowing legitimate crawling for SEO analysis.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.