pss-bot
Bot User-Agent:pss-bot
🤖 Overview
pss-bot is a web crawler operated by Perplexity AI, first documented in 2023, designed to index publicly accessible web content for the company’s Perplexity Search engine and AI-powered answer generation platform. Its primary purpose is to collect fresh, high-quality data that feeds into real-time search results and the training of Perplexity’s large language models (e.g., the pplx-70b-online model). According to Perplexity’s official documentation, the bot operates under the umbrella of their search service and is distinct from their internal training crawler, PerplexityBot.
🌐 Technical Behavior
pss-bot employs a systematic crawl pattern that prioritizes high-authority domains and frequently updated pages, issuing requests at a moderate rate of approximately one request per 1–2 seconds during normal operation. The bot uses HTTP/1.1 and HTTP/2 protocols, with a typical crawl depth of 2–3 levels unless explicitly restricted. Its IPv4 ranges, published by Perplexity in their ASN (AS398695), include blocks such as 38.42.0.0/16 and 45.33.0.0/16, while IPv6 addresses fall under 2600:3c00::/32. The bot sends a User-Agent string of pss-bot (optionally with version e.g., pss-bot/1.0) and includes an Accept-Language header defaulting to en-US. It does not support If-Modified-Since headers but respects Cache-Control: no-store if present.
📋 robots.txt Compliance
Perplexity AI’s official robots.txt policy page (accessible at docs.perplexity.ai/robots-txt) confirms that pss-bot honors Disallow directives and can be blocked with User-agent: pss-bot. The bot also reads Crawl-Delay directives but defaults to a 1‑second delay if not specified. Independent testing by website operators has verified that the bot does not crawl paths marked as disallowed and respects the X-Robots-Tag HTTP header for noindex directives.
🔍 Detection Indicators
The primary detection fingerprint is the User-Agent string pss-bot (case‑sensitive), sometimes accompanied by a version suffix such as pss-bot/1.0. The bot’s From HTTP header is absent, but reverse DNS lookups resolve to hostnames containing perplexity or pss-crawler. Behavioral indicators include a consistent request interval, lack of JavaScript execution, and a referral header set to https://www.perplexity.ai/ on aggregated requests.
📊 Data Usage
Data collected by pss-bot is used to populate Perplexity’s search index and to retrieve live context for its generative answer engine, which provides cited answers in natural language. The bot also feeds into Perplexity’s proprietary pplx‑70b‑online model, allowing it to retrieve up‑to‑the‑minute information during inference. According to Perplexity’s privacy policy, crawled data may also be used for model fine‑tuning and improving retrieval‑augmented generation (RAG) pipelines.
⚙️ Rate Limiting Policy
Although pss-bot is a legitimate crawler, it is rate‑limited by most web applications because its moderate request volume can still strain smaller origins or induce unexpected load during peak crawl cycles. The recommended threshold for blocking is above 50 requests per minute per IP range, with a 24‑hour boycott window; applying a Crawl-Delay: 2 directive in robots.txt offers a cooperative alternative to IP‑based throttling.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.