blt

Bot User-Agent: blt

🤖 Overview

BLT is a web crawler operated by BLT AI Inc., a company specializing in large-scale data collection for machine learning model training. The bot systematically indexes publicly available web pages to feed into BLT’s proprietary dataset used for training natural language processing and computer vision models.

🌐 Technical Behavior

BLT crawls with an average request rate of 50 queries per second, using distributed IP ranges from ASN 39821 (BLT-AS). It follows standard HTTP/1.1 and HTTP/2 protocols, sending requests with a configurable crawl delay parameter. The bot respects Crawl-Delay directives in robots.txt, defaulting to 10 seconds when none is specified. Its crawl pattern is breadth-first, starting from seed URLs provided by its operators, and it avoids crawling dynamic content with query strings that exceed a length of 1024 characters.

📋 robots.txt Compliance

According to BLT’s official documentation, the bot fully honors Disallow directives and respects Allow overrides. It also obeys User-agent: * rules. Evidence from public crawl logs confirms that BLT does not access paths listed in robots.txt.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; BLT/2.0; +https://blt.ai/bot). It also sends a custom header X-BLT-Version: 2.0 and a From header with the contact email [email protected]. Behavioral fingerprints include high request concurrency and adherence to rate limits.

📊 Data Usage

Collected data is used exclusively for training BLT’s AI models, including large language models and image recognition systems. The company publishes a transparency report quarterly on its data sources, detailing the types of websites crawled.

⚙️ Rate Limiting Policy

Rate limiting is required because BLT’s crawl volume can exceed 100,000 requests per day per domain, which may degrade server performance. A threshold of 500 requests per minute is recommended for rate limiting, allowing the bot to operate without overwhelming origin servers while ensuring legitimate traffic is not blocked.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.