voyager

Bot User-Agent: voyager

🤖 Overview

Voyager is a web crawler operated by Voyager AI, a company specializing in advanced language model development. Its primary purpose is to systematically gather publicly available web content to train and improve Voyager’s generative AI models, including those used for search and content synthesis. The bot was first publicly acknowledged in early 2023, with official documentation published on Voyager’s developer site.

🌐 Technical Behavior

Voyager employs a distributed crawling architecture that sends requests from multiple IP addresses within the ASN range 204.15.0.0/16 (inclusive of both IPv4 and IPv6 blocks) as documented in their official IP list. The crawler observes a default crawl delay of 10 seconds between requests, as indicated by its compliance with the `Crawl-Delay` directive. It identifies itself via the `From` header and uses HTTP/1.1 with TLS 1.2+ encryption. According to Voyager’s technical specifications, the bot limits concurrent connections to one per host to reduce server load. It also respects HTTP `503` and `Retry-After` headers for rate limiting.

📋 robots.txt Compliance

Voyager explicitly honors `Disallow` directives in `robots.txt`, as stated in its official policy at `https://voyager.ai/robots.txt`. The crawler also respects `Allow` overrides and will not crawl paths that are disallowed regardless of crawl depth. Independent testing by webmasters has confirmed that Voyager does not ignore `robots.txt` and will cease crawling a path within minutes of a directive change.

🔍 Detection Indicators

The primary User-Agent string is `Voyager/1.0`, with optional additional tokens such as `(Compatible; VoyagerBot; +https://voyager.ai/bot)`. The crawler includes a `User-Agent` header field that always contains the substring `Voyager`. Additionally, the `Voyager` bot sets the `Accept-Language` header to `en-US,en;q=0.9` and uses a `Via` header when passing through proxies. Reverse DNS lookups on IPs often resolve to `*.crawl.voyager.ai`.

📊 Data Usage

Collected data is processed and used exclusively to train Voyager AI’s large language models, including next-generation conversational agents and knowledge retrieval systems. The company states that personal data is anonymized and not used for user profiling. Aggregated website content contributes to improving factuality, reasoning, and multilingual capabilities of the models.

⚙️ Rate Limiting Policy

Because Voyager can generate a high volume of requests across its distributed IP pool, webmasters commonly implement rate limiting with thresholds around 20 requests per minute per IP to prevent excessive load. This policy is justified by the need to balance data collection with server resource preservation without outright blocking the legitimate crawler.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.