insumascout

Bot User-Agent: insumascout

🤖 Overview

InsumaScout is a web crawler operated by Insuma AG, a Swiss company headquartered in Zurich that specializes in competitive intelligence and web data extraction services. The bot systematically scans publicly accessible web content to feed into Insuma’s analytics platform, which provides clients with real-time monitoring of pricing, product availability, and market trends. According to Insuma’s official website, the crawler is used exclusively for business analytics and is not connected to any AI training or search indexing activities.

🌐 Technical Behavior

InsumaScout employs a distributed crawling architecture that originates from a pool of IPv4 and IPv6 addresses registered primarily in Switzerland (ASN 12350) and Germany. The bot sends requests at a moderate rate of 1–3 requests per second per IP, but can scale up to 10 requests per second when crawling large sites, as documented in Insuma’s technical whitepaper. It uses HTTP/1.1 with Keep-Alive enabled and supports both GET and POST methods, focusing on HTML pages, JSON endpoints, and RSS feeds. The crawler does not execute JavaScript or load images; it parses raw HTML using a custom extraction engine. It sends a distinct X-Insuma-Client header (value: web-crawler) for identification, and its requests include a valid Accept-Language header (en-US,en;q=0.9). InsumaScout respects HTTP caching headers and does not follow meta refresh redirects beyond 5 seconds.

📋 robots.txt Compliance

InsumaScout fully honors robots.txt directives, including Disallow rules and Crawl-delay settings, as stated in Insuma’s official crawler policy page. Evidence from publicly available server logs and discussions on webmaster forums (e.g., WebmasterWorld, 2023) confirms that the bot consistently abides by robots.txt restrictions. Insuma also provides a dedicated contact form for site owners to request adjustments to crawl behavior, further demonstrating compliance.

🔍 Detection Indicators

The primary User-Agent string is InsumaScout/1.0 (or InsumaScout/2.0 for newer deployments). Additionally, the bot sends a custom HTTP header X-Insuma-Client: web-crawler and a From header containing an email address (e.g., [email protected]). Behavioral fingerprints include consistent request intervals, lack of cookie handling, and absence of Accept-Encoding for gzip (though it does accept deflate). The bot does not send a Referer header for the first request on a domain.

📊 Data Usage

Collected data is used exclusively for Insuma’s analytics platform, which offers clients dashboards and alerts for competitive pricing monitoring, stock-level changes, and content updates. Per Insuma’s privacy policy, no personally identifiable information is intentionally collected, and raw data is retained for a maximum of 30 days. The data is not used for training large language models or for general-purpose search indexing; it is solely for structured business intelligence reporting.

⚙️ Rate Limiting Policy

InsumaScout is rate-limited because its systematic, distributed crawling can inadvertently overload under-optimized web servers, especially when multiple IPs target the same site concurrently. A threshold-based blocking policy (e.g., 20 requests per minute per IP) is recommended to prevent service degradation while still permitting legitimate data collection at the moderate rates documented by Insuma.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.