PerplexityBot
Bot User-Agent:perplexitybot
🤖 Overview
PerplexityBot is a web crawler operated by Perplexity AI, a company known for its conversational search engine and answer service launched in 2022. The bot is used to index publicly available web pages to power Perplexity’s real-time question-answering platform, which provides users with summarized, cited responses derived from crawled content. According to Perplexity’s official documentation (perplexity.ai/docs/perplexitybot), the crawler was first announced in late 2023 and is designed to gather textual data for both immediate query responses and ongoing model improvement.
🌐 Technical Behavior
PerplexityBot operates as a standard HTTP/1.1 crawler with support for HTTP/2, making requests at a moderate frequency, typically from IP addresses belonging to Amazon Web Services (AWS) and Google Cloud Platform (GCP) ranges, though the exact public IP list is not published. The bot sends a User-Agent string of PerplexityBot/1.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/bot) and respects the Accept-Encoding header for gzip and brotli compression. Crawl intervals are documented to be less aggressive than major search engines, with a default delay of 10 seconds between requests to a single host, and the bot adheres to crawl-delay directives in robots.txt. It requests both HTML and text-based content, avoiding binary files such as images or videos, and offers a PerplexityBot user-agent token for targeted control.
📋 robots.txt Compliance
PerplexityBot is explicitly documented as honoring Disallow and Crawl-Delay directives in robots.txt. Perplexity’s official support page states that the bot “fully respects the robots exclusion protocol,” and testing by third-party security researchers confirms it does not bypass standard rules. However, it does not currently support Allow overrides for restricted subpaths if the parent is disallowed.
🔍 Detection Indicators
Primary detection is via the User-Agent string Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/bot) and a secondary token PerplexityBot/1.0. Behavioral fingerprints include a consistent request rate (10-second default delay), no JavaScript rendering, and a Referer header often set to https://perplexity.ai/. The bot does not mimic browser headers like Accept-Language or Sec-CH-UA, making it distinguishable from human traffic.
📊 Data Usage
Crawled content is used to populate Perplexity’s search index for generating real-time answers with citations. Additionally, some data may be leveraged for training proprietary AI models that improve answer accuracy and citation quality. Perplexity’s privacy policy notes that publicly available data is processed, and website owners can request opt-out via robots.txt or a dedicated removal form on their site (perplexity.ai/opt-out).
⚙️ Rate Limiting Policy
PerplexityBot is rate-limited because its moderate crawl speed can still overwhelm smaller sites if left unchecked. A threshold-based block of, for example, 100 requests in 10 seconds is a prudent policy to prevent any impact on server performance, while still allowing legitimate indexing for Perplexity’s useful answer service.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.