voyager
Bot User-Agent:voyager
🤖 Overview
Voyager is a web crawler operated by Voyager AI, a company specializing in advanced language model development. Its primary purpose is to systematically gather publicly available web content to train and improve Voyager’s generative AI models, including those used for search and content synthesis. The bot was first publicly acknowledged in early 2023, with official documentation published on Voyager’s developer site.
🌐 Technical Behavior
Voyager employs a distributed crawling architecture that sends requests from multiple IP addresses within the ASN range 204.15.0.0/16 (inclusive of both IPv4 and IPv6 blocks) as documented in their official IP list. The crawler observes a default crawl delay of 10 seconds between requests, as indicated by its compliance with the `Crawl-Delay` directive. It identifies itself via the `From` header and uses HTTP/1.1 with TLS 1.2+ encryption. According to Voyager’s technical specifications, the bot limits concurrent connections to one per host to reduce server load. It also respects HTTP `503` and `Retry-After` headers for rate limiting.
📋 robots.txt Compliance
Voyager explicitly honors `Disallow` directives in `robots.txt`, as stated in its official policy at `https://voyager.ai/robots.txt`. The crawler also respects `Allow` overrides and will not crawl paths that are disallowed regardless of crawl depth. Independent testing by webmasters has confirmed that Voyager does not ignore `robots.txt` and will cease crawling a path within minutes of a directive change.
🔍 Detection Indicators
The primary User-Agent string is `Voyager/1.0`, with optional additional tokens such as `(Compatible; VoyagerBot; +https://voyager.ai/bot)`. The crawler includes a `User-Agent` header field that always contains the substring `Voyager`. Additionally, the `Voyager` bot sets the `Accept-Language` header to `en-US,en;q=0.9` and uses a `Via` header when passing through proxies. Reverse DNS lookups on IPs often resolve to `*.crawl.voyager.ai`.
📊 Data Usage
Collected data is processed and used exclusively to train Voyager AI’s large language models, including next-generation conversational agents and knowledge retrieval systems. The company states that personal data is anonymized and not used for user profiling. Aggregated website content contributes to improving factuality, reasoning, and multilingual capabilities of the models.
⚙️ Rate Limiting Policy
Because Voyager can generate a high volume of requests across its distributed IP pool, webmasters commonly implement rate limiting with thresholds around 20 requests per minute per IP to prevent excessive load. This policy is justified by the need to balance data collection with server resource preservation without outright blocking the legitimate crawler.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.