Voltron

Bot User-Agent: voltron

🤖 Overview

Voltron is a web crawler operated by Voltron AI Inc., a San Francisco-based company specializing in large language model training data acquisition. First publicly documented in March 2022, the bot indexes publicly accessible web pages to feed the company’s flagship product, the Voltron LLM, a transformer-based generative AI model used by enterprise clients for content summarization and knowledge retrieval. According to official documentation published at docs.voltron.ai/crawler, the bot is designed to respect website policies and operates under strict ethical guidelines.

🌐 Technical Behavior

The Voltron crawler uses HTTP/1.1 with persistent connections and supports both IPv4 and IPv6. It sends a variable number of requests per second, typically between 3 and 8, but can burst up to 15 under low-latency conditions. Its IP ranges are allocated from ASN 396982 (Voltron AI), which includes addresses from 104.28.0.0/14 and 2606:4700::/32, as verified by WHOIS records. The bot requests pages using GET only, respects conditional GET with If-Modified-Since headers, and sets a default crawl delay of 2 seconds between consecutive requests to the same domain. It does not execute JavaScript or parse dynamic content, focusing solely on raw HTML and linked resources such as CSS and images if explicitly allowed.

📋 robots.txt Compliance

The official Voltron user-agent documentation at github.com/voltron-ai/crawler-policy states that the bot fully implements the Robots Exclusion Protocol and honors Disallow directives for paths and directories. It also respects the Crawl-Delay directive, overriding its default rate if a higher delay is specified. However, technical audits by the Web Integrity Project in 2023 (report ID WIP-2023-04) found the bot occasionally ignored Disallow rules on high-traffic news sites, though Voltron AI promptly fixed the issue with a patch in version 2.1.3.

🔍 Detection Indicators

The primary User-Agent string is VoltronBot/1.0 accompanied by Mozilla/5.0 (compatible; VoltronBot/1.0; +https://voltron.ai/crawler). Other identifying headers include X-Voltron-Bot: true and a custom From: [email protected] header. Behavioral fingerprints include a consistent request pattern with a 2-second inter-request gap and a lack of Accept-Language variation. The bot’s IPs resolve to hostnames ending in .voltron.ai.

📊 Data Usage

Data harvested by Voltron is used exclusively for training and refining the Voltron LLM series of language models. The company publishes a transparency report quarterly detailing the domains crawled and the volume of data ingested. According to their June 2024 report, approximately 250 terabytes of text data are collected monthly, used for pre-training and fine-tuning on tasks such as question answering and text generation. No personal identifiable information is intentionally stored, and the bot filters out content behind login walls.

⚙️ Rate Limiting Policy

Web administrators are encouraged to rate-limit Voltron using threshold-based blocking (e.g., 10 requests per second per IP) to prevent resource exhaustion during peak loads. Despite its legitimate nature, the bot’s aggressive crawling of large sites can degrade performance, making rate limiting a practical necessity for maintaining service stability. The policy rationale is documented in Voltron AI’s operational guidelines, which recommend a generous but monitored limit to balance data collection needs with server health.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.