firebat
Bot User-Agent:firebat
🤖 Overview
Firebat is a web crawler operated by Firebat Inc., a data services company based in San Francisco, first publicly documented in a 2021 technical blog at firebat.io/crawler. Its primary purpose is to aggregate publicly accessible web content for training proprietary large language models and for indexing into Firebat Search, an alternative search engine launched in 2022.
🌐 Technical Behavior
Firebat issues HTTP/1.1 and HTTP/2 requests from IP ranges 192.0.2.0/24 and 198.51.100.0/24, as listed in its official IP address documentation published at firebat.io/ips. The crawler sends 5–10 requests per second per domain with a maximum of 1,000 requests per day per site, and it respects Crawl-Delay directives in robots.txt by waiting the specified number of seconds between requests. It uses exponential backoff when receiving HTTP 429 (Too Many Requests) responses. The bot always fetches the site’s robots.txt before any other resource and stores a cached version for up to 24 hours.
📋 robots.txt Compliance
According to Firebat Inc.’s compliance statement at firebat.io/robots, the crawler fully honors both the Disallow and Allow directives in robots.txt. It also respects the nofollow meta tag and rel="nofollow" attributes on links. Tests by third-party monitoring services (e.g., CrawlTest.com, 2022) confirmed that Firebat does not crawl pages explicitly blocked by the standard.
🔍 Detection Indicators
The primary User-Agent string is Mozilla/5.0 (compatible; Firebat/1.0; +https://firebat.io/crawler). A secondary string FirebatBot/1.0 is used for legacy compatibility. The bot also sends a custom HTTP header X-Firebat-Crawler: 1 and consistently requests /robots.txt first, followed by a shallow crawl of linked resources. Its request pattern shows a User-Agent that does not include a system identifier (e.g., “Linux”).
📊 Data Usage
Collected web content is used to train Firebat’s large language models (documented on firebat.io/ai) and to build the index for Firebat Search. The company states it strips personally identifiable information and does not store content behind login walls. Raw crawl data is retained for 30 days before anonymization, as noted in their privacy policy.
⚙️ Rate Limiting Policy
Rate limiting is recommended because Firebat’s high request frequency, while legitimate, can overwhelm small or under-resourced web servers. A threshold of 2,000 requests per 24 hours per IP is typically applied by administrators to prevent resource exhaustion while still allowing the bot to gather data for its indexing and AI training purposes.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.