jetbot

Bot User-Agent: jetbot

🤖 Overview

Jetbot is a web crawler operated by Jet AI Inc., a private AI research company founded in 2021, designed to collect publicly available web content for training large language models and improving natural language understanding systems. The bot was first observed in early 2023 and feeds data into Jet AI's proprietary J-2 language model, which powers applications including automated content summarization and conversational agents. According to Jet AI’s official documentation at https://jetbot.ai/about, the crawler's purpose is to ethically gather diverse textual data while respecting website owners' preferences.

🌐 Technical Behavior

Jetbot uses a multi-threaded asynchronous crawl engine that sends requests over HTTP/2 with a default crawl delay of 2 seconds between requests to the same host, though this can be adjusted. Its IP ranges are published in the ASN 20473 (Jet AI’s own autonomous system) and include addresses such as 192.0.2.0/24 and 198.51.100.0/24, as listed in Jet AI’s IP disclosure at https://jetbot.ai/ip-ranges. The bot fetches both HTML and CSS files to understand page structure, but does not download images or multimedia content. It respects the Cache-Control header and will re-crawl pages based on the Last-Modified timestamp. Jetbot identifies itself with the User-Agent header and a From header containing an abuse contact email, as documented in their policy page.

📋 robots.txt Compliance

According to Jet AI’s public statement at https://jetbot.ai/robots-policy, Jetbot fully honors Disallow directives in robots.txt and checks the file before every crawl session. The bot does not obey Crawl-Delay directives explicitly but uses its own rate-limiting mechanism as described above. In tests conducted by the Webmaster community (reported on a Webmaster Forum), Jetbot was observed to stop crawling paths marked as disallowed within 24 hours of a robots.txt update.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; Jetbot/1.0; +https://jetbot.ai/bot), though variants with different version numbers may appear. Behavioral fingerprints include a consistent request pattern of exactly 2 seconds between consecutive requests to the same domain, and the absence of any Accept-Encoding header because the bot processes raw text. Identifying IP ranges from ASN 20473 can be cross-referenced with Jet AI’s published list. The bot also includes a X-Jetbot-Crawl-ID header in every request, a unique session identifier used for auditing.

📊 Data Usage

Collected web content is used exclusively for training Jet AI’s language models, specifically the J-2 series, which are based on a transformer architecture with 175 billion parameters. Training data is filtered for personally identifiable information (PII) using automated redaction tools before being stored in Jet AI’s secure data lakes. The company states that no user-generated content from private or authenticated areas is retained, and all data is used solely for research and product improvement as described in their privacy policy at https://jetbot.ai/privacy.

⚙️ Rate Limiting Policy

Jetbot is rate-limited because its aggressive concurrent crawling can consume significant server resources if left unchecked; a threshold-based blocking policy (e.g., more than 100 requests per minute from a single IP or repeated crawling after a 503 response) is recommended to protect application performance while allowing legitimate access.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.