Scanbot Bot — Detection, Blocking & Technical Analysis

Scanbot

Scanner User-Agent: scanbot

🤖 Overview

Scanbot is a web crawler operated by Scanbot GmbH (scanbot.io), a German company providing website auditing and SEO analysis tools. Its primary purpose is to scan websites for technical issues, broken links, duplicate content, and on-page SEO factors, feeding data into the Scanbot Dashboard for site owners and SEO professionals. According to official documentation at scanbot.io/crawler, the bot is designed for constructive website improvement and is explicitly not used for AI model training or content republishing.

🌐 Technical Behavior

Scanbot performs crawling using a configurable user-agent string that defaults to Scanbot/2.0 but can be customized by the website owner during setup. It sends requests from IP ranges registered to Scanbot GmbH, primarily from German datacenters (ASN 24940 and ASN 197540). The crawler obeys a delay between requests of at least one second by default, though this can be adjusted in the Scanbot Console. It supports both HTTP/1.1 and HTTP/2 protocols and sends a standard Accept-Language: en-US,en;q=0.9 header. Scanbot also includes a unique X-Scanbot-Request-ID header for traceability, as noted in their technical blog at docs.scanbot.io/crawler-identification.

📋 robots.txt Compliance

Scanbot fully respects robots.txt directives, including Disallow and Crawl-Delay rules. Their documentation explicitly states that if a site blocks the robot via robots.txt, Scanbot will not attempt to circumvent the restriction. The bot also honors noindex and nofollow meta tags in HTML pages, as verified in the official compliance page at scanbot.io/robots.

🔍 Detection Indicators

The primary User-Agent string is Scanbot/2.0 (e.g., Scanbot/2.0 (+https://scanbot.io/crawler)). Behavioral fingerprints include sequential request patterns with a consistent one-second crawl delay and the absence of JavaScript rendering. Additionally, the X-Scanbot-Request-ID header (a 32-character hex string) is a reliable identifier. Reverse DNS lookups for Scanbot IPs often resolve to *.scanbot.io, as per their known IP list published at scanbot.io/ip-ranges.

📊 Data Usage

Data collected by Scanbot is used exclusively for the website owner’s own auditing purposes: generating reports on SEO health, site structure, broken links, and performance metrics. The extracted content is not stored on Scanbot servers beyond the duration of the crawl (typically 24-72 hours) and is not shared with third parties. Scanbot GmbH’s privacy policy (scanbot.io/privacy) confirms that no text content is used for training language models or for sale.

⚙️ Rate Limiting Policy

Because Scanbot can be configured to run multiple parallel crawls at high speed (especially when a user schedules a full-site scan), it is rate-limited to prevent excessive load on target servers. Most web application firewalls implement a threshold of 10 requests per second per IP from Scanbot ranges, which aligns with the bot’s own default crawl delay and prevents it from overwhelming non-production environments.

Similar Threats

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

Scanbot

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

53% of Web Traffic Is Bots in 2026

Company

Resources

Services

Trusted

Subscribe