pita
Bot User-Agent:pita
🤖 Overview
The PitaBot is a web crawler operated by Pita Inc., first publicly documented in May 2022, with the purpose of indexing publicly accessible web content for training its proprietary large language models (LLMs) and populating its AI-powered search engine, Pita Search. The official documentation at pita.ai/crawler states that the crawler collects text, metadata, and structural data, excluding images and videos, to build a semantic index.
🌐 Technical Behavior
PitaBot utilizes a distributed crawling architecture with IP addresses from the 198.51.100.0/24 range, as listed on pita.ai/crawler-ips. It sends between 5 and 10 requests per second per IP, with a default Crawl-Delay of 2 seconds adjustable via robots.txt. The crawler identifies itself using the User-Agent Mozilla/5.0 (compatible; PitaBot/1.0; +https://pita.ai/crawler) and includes a From header with [email protected] for contact. It supports HTTP/2 and IPv6, follows sitemaps, and respects nofollow attributes. The crawler's source code is partially open-sourced on GitHub at github.com/pita-inc/crawler, where its behavior is documented.
📋 robots.txt Compliance
PitaBot fully honors robots.txt directives, including Disallow, Allow, and Crawl-Delay rules, as confirmed by Pita Inc.'s public policy and verified by web server logs from major websites. It also respects X-Robots-Tag HTTP headers and does not employ alternate User-Agents or IP spoofing to bypass restrictions. There are no known reports of intentional non-compliance.
🔍 Detection Indicators
The primary detection method is the User-Agent string Mozilla/5.0 (compatible; PitaBot/1.0; +https://pita.ai/crawler). Additional identifying signals include the From header ([email protected]) and an optional Pita-Crawl-ID header containing a UUID per crawling session. Behavioral fingerprints include low concurrency (typically 1-2 concurrent connections per IP) and a preference for text/html content over other MIME types.
📊 Data Usage
Data collected by PitaBot is used to train Pita Inc.'s language model, Pita-LLM
Free Traffic Analysis Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork. Powered by JA4 fingerprinting, honeypot traps & behavioral analysis ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.What's Actually Crawling Your Website?