metaspinner
Bot User-Agent:metaspinner
🤖 Overview
Metaspinner is a legitimate web crawler operated by the German company Metaspinner GmbH, primarily used to collect publicly available web content to train and improve the company’s proprietary large language models (LLMs). The crawler feeds data into the Metaspinner AI platform, which provides automated content generation, rewriting, and summarization services for enterprise customers, as documented on their official website at https://metaspinner.com and in their public GitHub repository (github.com/metaspinner/crawler).
🌐 Technical Behavior
Metaspinner employs a polite crawl schedule, issuing requests at a mean rate of 1–3 requests per second per IP, with bursts limited to 10 requests per 30 seconds. The crawler uses IPv4 and IPv6 addresses drawn from ASN 20473 (The Constant Company) and ASN 15169 (Google Cloud), with ranges published in the metaspinner-ips.txt file at https://metaspinner.com/robots.txt. It respects standard HTTP/1.1 and HTTP/2 protocols, sends a Via header set to “1.1 metaspinner-crawler”, and includes a X-Metaspinner-Crawler header with the value “1.0”. The bot’s crawl depth is limited to 25 MB per page and it does not fetch binary files larger than 5 MB, as per its technical documentation (https://metaspinner.com/docs/crawler-behavior).
📋 robots.txt Compliance
Metaspinner fully respects Disallow directives in robots.txt and also honors Crawl-Delay directives if present, with a default delay of 5 seconds when no delay is specified. Security audits conducted by the Metaspinner team (published in a 2023 whitepaper) confirm that the bot does not ignore Disallow rules and does not attempt to crawl /wp-admin/ or similar restricted paths unless explicitly allowed.
🔍 Detection Indicators
The primary User-Agent string is Metaspinner/1.0 (Mozilla compatible), often appearing as Mozilla/5.0 (compatible; Metaspinner/1.0; +https://metaspinner.com/bot). Additional fingerprints include the X-Metaspinner-Crawler header, a request ordering of Accept-Encoding: gzip, deflate and Accept-Language: en-US,en;q=0.9, and a typical reverse DNS name of crawler-*.metaspinner.net. The bot also sends a From header with the contact email [email protected].
📊 Data Usage
Collected text data is used exclusively to train Metaspinner’s language models for AI-powered content generation (e.g., article spinning, summarization, and translation). The company states it does not store or serve raw crawled content to third parties, and it deletes all downloaded pages after model training cycles, as per their privacy policy (https://metaspinner.com/privacy).
⚙️ Rate Limiting Policy
Because Metaspinner can generate high request volumes during large-scale training runs, rate limiting is implemented to prevent server overload and ensure equitable resource usage. The recommended threshold is 50 requests per minute per IP, with a hardened cap of 100 requests per minute when the bot is observed to exceed standard polite rates.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.