Perplexity-User
Bot User-Agent:perplexity-user
đ¤ Overview
Perplexity-User is a legitimate web crawler operated by Perplexity AI, a San Franciscoâbased company founded in 2022 that develops a conversational AI search engine powered by large language models. The botâs primary purpose is to fetch web pages and indexed content in realâtime to generate answers with cited sources for user queries, acting as a âreading layerâ over traditional search results. Perplexity AI explicitly states the bot is used to âfetch and process web pages for our AIâpowered answer engineâ and is not employed for proprietary model training, differentiating it from other AI crawlers.
đ Technical Behavior
PerplexityâUser performs both realâtime fetching when a user asks a question and periodic reâcrawling of popular domains to update its knowledge base. According to Perplexityâs official documentation, the bot sends requests with a rate limit of about 1 request per second (1 QPS) per domain, though bursts of up to 3 requests per second may occur during complex queries. The crawler uses IPv4 addresses from a dynamic pool allocated to Perplexity AI, with ranges frequently reported as belonging to Cloudflare or Amazon Web Services (AWS). The bot strictly follows HTTP/1.1 and HTTP/2 protocols, sends a UserâAgent header containing âPerplexityâUserâ, and includes an Accept: text/html header. It does not transmit cookies or session data, and respects robots.txt crawl delays as specified per second (CrawlâDelay directive). All requests originate from IPs that reverseâresolve to perplexity.ai, and the bot includes a From: [email protected] header in some instances to facilitate webmaster contact.
đ robots.txt Compliance
Perplexity AI states that PerplexityâUser ârespects robots.txt directives in the same way as other major search engine bots.â Official documentation confirms that the bot reads the Disallow rules and enforces a CrawlâDelay if specified, but does not support the more granular Allow override for subâpaths that some crawlers use. Webmasters can block the bot entirely by adding Userâagent: PerplexityâUser and Disallow: / to their robots.txt file. Perplexity AI also provides an optâout form on their website for sites that do not wish to be indexed, which overrides robots.txt in some edge cases. Realâworld testing by security researchers (e.g., Darknet Diaries blog, 2024) shows that the bot respects directives within a few seconds of fetching the robots.txt file, though delayed compliance (up to 30 seconds) has been reported due to caching by Perplexityâs backend.
đ Detection Indicators
The primary UserâAgent string is Mozilla/5.0 (compatible; PerplexityâUser/1.0; +https://www.perplexity.ai/robots.txt). A secondary string PerplexityâUser/1.0 is used in some subârequests. Behavioural fingerprints include a request rate of ~1 QPS, absence of JavaScript execution, and a consistent AcceptâLanguage: enâUS,en;q=0.9 header. The bot does not spoof other user agents or modify its IP between requests within the same session. Network administrators can monitor for incoming connections from AS396982 (Perplexity AIâs autonomous system) or from known Cloudflareâorigin IPs that include perplexity.ai in reverse DNS. The XâForwardedâFor header may be present when routing through Perplexityâs proxy layer, but its value is not static.
đ Data Usage
Collected dataâincluding the full text of fetched web pages, page titles, meta descriptions, and structured data such as schema.org markupâis used exclusively by Perplexity AI to generate realâtime, citationârich answers for end users. Perplexity AI explicitly states in their privacy policy (updated April 2024) that webâsourced content âis not used to train or fineâtune our language models.â Instead, it is temporarily cached (for up to 30 days) to answer user queries and is discarded after that period unless the page is reâfetched. The company does not sell the data to third parties nor use it for advertising, distinguishing it from many other AI crawlers used for model training.
âď¸ Rate Limiting Policy
PerplexityâUser is rateâlimited by web administrators because its realâtime fetching can generate sudden bursts of concurrent requests if multiple users query the same domain simultaneously, potentially overwhelming smaller servers. A thresholdâbased block (e.g., more than 5 requests per minute from the same IP) is recommended to preserve server resources while still allowing legitimate access for answer generation, as Perplexity AI acknowledges that exceptional volumes during highâtraffic queries may need throttling by site owners.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated â and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan âNo credit card required ¡ Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.