besserscheitern-crawl Bot — Detection, Blocking & Technical Analysis

besserscheitern-crawl

Crawler User-Agent: besserscheitern-crawl

🤖 Overview

besserscheitern-crawl is a web crawler operated by the German publishing house Besser Scheitern (besserscheitern.de), a platform focused on failure culture and entrepreneurial learning. According to the project’s GitHub repository (github.com/besserscheitern/crawler), this bot is designed to periodically crawl partner websites and community blogs to aggregate content for the company’s internal content management system and newsletter feed. The crawler was first deployed in early 2022 and is maintained by a small engineering team based in Berlin, Germany.

🌐 Technical Behavior

The bot employs a scrapy-based crawling framework with a configurable crawl depth of up to three levels. Official documentation on the project’s GitHub wiki indicates that requests are made with a fixed interval of 10 seconds between page fetches, and the crawler respects HTTP 429 (Too Many Requests) responses by backing off for at least 60 seconds. The user-agent string is constructed as Mozilla/5.0 (compatible; besserscheitern-crawl/1.0; +https://besserscheitern.de/crawler). The bot only crawls websites listed in a pre-approved partner whitelist — it does not perform open-web discovery. All traffic originates from IP ranges belonging to Hetzner Online GmbH (AS24940), specifically the 116.203.0.0/16 block, as verified by DNS reverse lookups published in the bot’s documentation.

📋 robots.txt Compliance

The bot explicitly states in its source code (available on GitHub) that it fully parses robots.txt directives before each crawl session. It checks both User-agent: besserscheitern-crawl and User-agent: * rules, and will abort the crawl entirely if the root path is disallowed. This behaviour is confirmed in a blog post from Besser Scheitern’s engineering team dated March 2022.

🔍 Detection Indicators

The primary identification string is User-Agent: Mozilla/5.0 (compatible; besserscheitern-crawl/1.0; +https://besserscheitern.de/crawler). Additionally, the bot sends a custom HTTP header X-BSCRAWL: 1 on every request, which is documented in the GitHub wiki as a fingerprint for site operators. Requests always include an Accept-Language: de-DE,de;q=0.9 header, reflecting its German origin.

📊 Data Usage

Collected content — including article titles, excerpts, and publication dates — is used exclusively for internal aggregation within the Besser Scheitern platform. The data powers a curated newsletter that highlights community-written failure narratives; no content is used for AI training, advertising, or resale. The bots scope is strictly limited to text extraction; images or multimedia files are not downloaded.

⚙️ Rate Limiting Policy

besserscheitern-crawl is rate-limited because it can generate up to 360 requests per hour to a single domain, which may exceed typical human traffic patterns. A threshold of 0.5 requests per second from the same IP block is recommended for monitoring, with blocking triggered only if the bot ignores 429 responses — a scenario that has never been reported in its operational history.

Similar Threats

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start · Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.