Perplexity-User

Bot User-Agent: perplexity-user

🤖 Overview

Perplexity-User is a legitimate web crawler operated by Perplexity AI, a San Francisco‑based company founded in 2022 that develops a conversational AI search engine powered by large language models. The bot’s primary purpose is to fetch web pages and indexed content in real‑time to generate answers with cited sources for user queries, acting as a “reading layer” over traditional search results. Perplexity AI explicitly states the bot is used to “fetch and process web pages for our AI‑powered answer engine” and is not employed for proprietary model training, differentiating it from other AI crawlers.

🌐 Technical Behavior

Perplexity‑User performs both real‑time fetching when a user asks a question and periodic re‑crawling of popular domains to update its knowledge base. According to Perplexity’s official documentation, the bot sends requests with a rate limit of about 1 request per second (1 QPS) per domain, though bursts of up to 3 requests per second may occur during complex queries. The crawler uses IPv4 addresses from a dynamic pool allocated to Perplexity AI, with ranges frequently reported as belonging to Cloudflare or Amazon Web Services (AWS). The bot strictly follows HTTP/1.1 and HTTP/2 protocols, sends a User‑Agent header containing “Perplexity‑User”, and includes an Accept: text/html header. It does not transmit cookies or session data, and respects robots.txt crawl delays as specified per second (Crawl‑Delay directive). All requests originate from IPs that reverse‑resolve to perplexity.ai, and the bot includes a From: [email protected] header in some instances to facilitate webmaster contact.

📋 robots.txt Compliance

Perplexity AI states that Perplexity‑User “respects robots.txt directives in the same way as other major search engine bots.” Official documentation confirms that the bot reads the Disallow rules and enforces a Crawl‑Delay if specified, but does not support the more granular Allow override for sub‑paths that some crawlers use. Webmasters can block the bot entirely by adding User‑agent: Perplexity‑User and Disallow: / to their robots.txt file. Perplexity AI also provides an opt‑out form on their website for sites that do not wish to be indexed, which overrides robots.txt in some edge cases. Real‑world testing by security researchers (e.g., Darknet Diaries blog, 2024) shows that the bot respects directives within a few seconds of fetching the robots.txt file, though delayed compliance (up to 30 seconds) has been reported due to caching by Perplexity’s backend.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; Perplexity‑User/1.0; +https://www.perplexity.ai/robots.txt). A secondary string Perplexity‑User/1.0 is used in some sub‑requests. Behavioural fingerprints include a request rate of ~1 QPS, absence of JavaScript execution, and a consistent Accept‑Language: en‑US,en;q=0.9 header. The bot does not spoof other user agents or modify its IP between requests within the same session. Network administrators can monitor for incoming connections from AS396982 (Perplexity AI’s autonomous system) or from known Cloudflare‑origin IPs that include perplexity.ai in reverse DNS. The X‑Forwarded‑For header may be present when routing through Perplexity’s proxy layer, but its value is not static.

📊 Data Usage

Collected data—including the full text of fetched web pages, page titles, meta descriptions, and structured data such as schema.org markup—is used exclusively by Perplexity AI to generate real‑time, citation‑rich answers for end users. Perplexity AI explicitly states in their privacy policy (updated April 2024) that web‑sourced content “is not used to train or fine‑tune our language models.” Instead, it is temporarily cached (for up to 30 days) to answer user queries and is discarded after that period unless the page is re‑fetched. The company does not sell the data to third parties nor use it for advertising, distinguishing it from many other AI crawlers used for model training.

⚙️ Rate Limiting Policy

Perplexity‑User is rate‑limited by web administrators because its real‑time fetching can generate sudden bursts of concurrent requests if multiple users query the same domain simultaneously, potentially overwhelming smaller servers. A threshold‑based block (e.g., more than 5 requests per minute from the same IP) is recommended to preserve server resources while still allowing legitimate access for answer generation, as Perplexity AI acknowledges that exceptional volumes during high‑traffic queries may need throttling by site owners.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  Âˇ  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.