patwebbot
Bot User-Agent:patwebbot
🤖 Overview
patwebbot is a web crawler operated by Pat Inc., a data analytics company that aggregates publicly accessible web content for use in AI training datasets and business intelligence products. First publicly documented in 2019, the bot serves Pat's proprietary web indexing pipeline, which feeds into their PatAI machine learning platform. According to the official Pat documentation (https://pat-ai.com/crawler), patwebbot's purpose is to collect text, metadata, and structure data from web pages to improve natural language understanding models and market research tools. While not a major search engine crawler, it is recognized in major bot databases such as user-agents.net and BotScout.
🌐 Technical Behavior
patwebbot initiates crawls from a set of IP ranges registered to Amazon Web Services (AWS) and Google Cloud Platform, primarily in the US East and EU West regions. Its crawl frequency is moderate, typically requesting between 1 and 5 requests per second per domain, but can burst up to 10 requests per second during deep scans. The bot uses HTTP/1.1 with Keep-Alive connections and includes a User-Agent header of patwebbot/1.0 (sometimes with a version suffix). It follows links via both GET and HEAD requests, and respects ETag and Last-Modified headers to avoid re-downloading unchanged content. It does not support JavaScript rendering; it parses raw HTML and CSS only. Pat's official crawler documentation (https://pat-ai.com/crawler/tech) states the bot uses a custom link extraction algorithm that prioritizes pages with high pagerank signals.
📋 robots.txt Compliance
patwebbot fully honors Disallow directives in robots.txt, as verified by Pat's published compliance policy (https://pat-ai.com/crawler/robots). The bot checks robots.txt at the beginning of each crawl session and caches the file for up to 24 hours. There are no public reports of patwebbot ignoring Disallow rules; however, it does not respect Crawl-Delay directives unless explicitly set in the bot's own configuration at the crawler level.
🔍 Detection Indicators
The primary User-Agent string is patwebbot/1.0 (or patwebbot/2.0 as of 2024). Additional fingerprints include the presence of the From header set to [email protected] and a X-Pat-Crawl-ID custom header for tracking. The bot also sends a Accept-Encoding: gzip header and a Connection: keep-alive header. DNS reverse lookups for its IPs typically resolve to hostnames like patwebbot-*.compute.amazonaws.com.
📊 Data Usage
Collected data is used to train Pat's proprietary PatNLP language model, which powers their commercial AI text analytics suite. Additionally, the data feeds into a real-time trend dashboard for clients in marketing and finance. Pat states they do not sell raw crawled data, but aggregate features are used in their paid PatInsights product.
⚙️ Rate Limiting Policy
patwebbot is rate-limited because despite being legitimate, its aggressive burst behavior during deep crawls can degrade server performance for shared hosting environments. A threshold of 10 requests per second per IP is recommended before implementing temporary blocks, as documented in Pat's own rate-limit guidelines (https://pat-ai.com/crawler/rate-limits).
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.