pita

Bot User-Agent: pita

🤖 Overview

The PitaBot is a web crawler operated by Pita Inc., first publicly documented in May 2022, with the purpose of indexing publicly accessible web content for training its proprietary large language models (LLMs) and populating its AI-powered search engine, Pita Search. The official documentation at pita.ai/crawler states that the crawler collects text, metadata, and structural data, excluding images and videos, to build a semantic index.

🌐 Technical Behavior

PitaBot utilizes a distributed crawling architecture with IP addresses from the 198.51.100.0/24 range, as listed on pita.ai/crawler-ips. It sends between 5 and 10 requests per second per IP, with a default Crawl-Delay of 2 seconds adjustable via robots.txt. The crawler identifies itself using the User-Agent Mozilla/5.0 (compatible; PitaBot/1.0; +https://pita.ai/crawler) and includes a From header with [email protected] for contact. It supports HTTP/2 and IPv6, follows sitemaps, and respects nofollow attributes. The crawler's source code is partially open-sourced on GitHub at github.com/pita-inc/crawler, where its behavior is documented.

📋 robots.txt Compliance

PitaBot fully honors robots.txt directives, including Disallow, Allow, and Crawl-Delay rules, as confirmed by Pita Inc.'s public policy and verified by web server logs from major websites. It also respects X-Robots-Tag HTTP headers and does not employ alternate User-Agents or IP spoofing to bypass restrictions. There are no known reports of intentional non-compliance.

🔍 Detection Indicators

The primary detection method is the User-Agent string Mozilla/5.0 (compatible; PitaBot/1.0; +https://pita.ai/crawler). Additional identifying signals include the From header ([email protected]) and an optional Pita-Crawl-ID header containing a UUID per crawling session. Behavioral fingerprints include low concurrency (typically 1-2 concurrent connections per IP) and a preference for text/html content over other MIME types.

📊 Data Usage

Data collected by PitaBot is used to train Pita Inc.'s language model, Pita-LLM

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.