superlumin downloader Bot — Detection, Blocking & Technical Analysis

superlumin downloader

Downloader User-Agent: superlumin-downloader

🤖 Overview

superlumin downloader is a legitimate web crawler operated by Superluminal AI, a company focused on building large-scale training datasets for machine learning models. First documented in early 2024, its primary purpose is to systematically download publicly accessible web content, including text, images, and structured data, which is then used to train foundation models and improve natural language understanding. The bot feeds data into Superluminal's proprietary training pipeline, as described in their official documentation at https://superluminal.ai/crawler.

🌐 Technical Behavior

Superluminal Downloader employs a distributed crawling architecture that can issue requests from a pool of IP addresses within the ASN ASXXXXX (Superluminal Inc.), with ranges including 203.0.113.0/24 and 198.51.100.0/24. The bot respects a crawl delay of 10 seconds between requests to the same host, as specified in its own configuration. It uses HTTP/1.1 and HTTP/2 protocols, and sends a From header containing a contact email address ([email protected]) to facilitate communication with webmasters. Requests are made with a default timeout of 30 seconds and follow redirects up to five hops. The crawler does not execute JavaScript and only fetches static HTML and linked resources (CSS, images, PDFs) as needed for content extraction.

📋 robots.txt Compliance

According to Superluminal's official guidelines, the bot fully honors robots.txt directives, including Disallow and Crawl-delay rules. It checks the robots.txt file for each domain at the start of a crawl session and caches the rules for up to 24 hours. Evidence from multiple webmaster forums confirms that the bot stops crawling paths explicitly prohibited in robots.txt.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; Superluminal-Downloader/1.0; +https://superluminal.ai/crawler). Secondary patterns include a User-Agent of Superluminal/2.0 (research) in later versions. The bot also sends an X-Robots-Tag header with value noarchive when accessing content that should not be cached. Behavioral fingerprints include consistent request intervals and a preference for non-robots.txt blocked paths.

📊 Data Usage

Collected data is used exclusively for training Superluminal's large language models and retrieval-augmented generation systems. The company states that it does not sell data to third parties and that all content is processed in compliance with copyright laws, including the use of a fair-use filter that excludes paywalled or explicitly copyrighted material. The training datasets are not publicly released.

⚙️ Rate Limiting Policy

Although Superluminal Downloader is legitimate, its high request volume can impact server resources, especially on small sites. Therefore, webmasters are advised to rate-limit the bot using standard throttling mechanisms (e.g., 5 requests per second per IP) to ensure equitable resource allocation while still allowing the crawler to access content.

Similar Threats

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start · Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.