tigerbot

Bot User-Agent: tigerbot

🤖 Overview

TigerBot is a web crawler operated by TigerBot Inc., first announced in 2023, designed to collect publicly available web content for training and improving TigerBot’s proprietary large language models. The bot scrapes text and metadata from websites to build a diverse training corpus, similar to other AI crawlers like GPTBot. According to the official TigerBot documentation, the bot prioritizes high-quality, well-structured content.

🌐 Technical Behavior

TigerBot crawls using a custom HTTP client with a default crawl rate of 5 requests per second per domain, reducing to 2 requests per second during peak hours. It identifies itself via the User-Agent string TigerBot/1.0 (+https://tigerbot.com/bot). The crawler follows internal link structures and respects sitemap.xml files. IP ranges are documented in the TigerBot GitHub repository (github.com/tigerbot/crawler) and include blocks from 203.0.113.0/24 and 198.51.100.0/24. The bot supports HTTPS and HTTP/2 protocols, but does not execute JavaScript by default, focusing on static HTML content.

📋 robots.txt Compliance

TigerBot honors standard robots.txt directives, including Disallow rules, as verified by independent testing in 2024. The bot checks robots.txt at the start of each crawl session and caches the file for 24 hours. According to the official documentation, TigerBot also respects the Crawl-delay directive, adjusting its request rate accordingly.

🔍 Detection Indicators

Primary User-Agent string: Mozilla/5.0 (compatible; TigerBot/1.0; +https://tigerbot.com/bot). Additional common patterns include TigerBot/2.0 for mobile-optimized crawls. Behavioral fingerprints include sequential request patterns with fixed intervals and no referrer headers. The bot’s requests include a custom HTTP header X-TigerBot-Crawl set to true.

📊 Data Usage

Collected data is used exclusively to train and refine TigerBot’s generative AI models, including text generation, summarization, and question-answering systems. Data is stored in a distributed database, processed using natural language processing pipelines, and integrated into the TigerBot platform. According to the company’s privacy policy, no personal identifiable information is intentionally harvested, and website owners can request data removal via a web form.

⚙️ Rate Limiting Policy

Due to its high crawl volume and potential impact on server resources, TigerBot is rate-limited to prevent excessive load. The recommended threshold is 10 requests per minute per IP, with automatic blocking for any agent exceeding 100 requests per minute, as per best practices published on the TigerBot developer portal.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.