informant Bot — Detection, Blocking & Technical Analysis

informant

Bot User-Agent: informant

🤖 Overview

Informant is a web crawler operated by Informant, Inc., a data analytics company that provides AI-driven content aggregation services. Its purpose is to collect publicly available web content for training natural language processing models and generating real-time market intelligence for enterprise clients. The crawler feeds data into Informant’s proprietary platform, which supports trend analysis, automated reporting, and personalised news feeds.

🌐 Technical Behavior

Informant uses a distributed crawling system that sends requests from datacenter IPs registered under ASN 15169 (Google Cloud) and ASN 16509 (Amazon AWS), though exact ranges are not publicly listed. It follows HTTP/1.1 and HTTP/2 protocols with gzip compression, and defaults to 10 requests per second per IP with a maximum of 100 concurrent connections. The crawler parses both raw HTML and structured data formats such as JSON-LD, Microdata, and RDFa, and adjusts its crawl rate dynamically based on server response times and Retry-After headers. Official documentation at informant.io/docs/crawler states that the bot does not revisit pages more than once per day unless a fresh content signal is detected.

📋 robots.txt Compliance

According to the publicly posted robots.txt policy on Informant’s website, the bot fully honors Disallow and Crawl-Delay directives, and supports the Allow rule for granular access control. Independent audits by webmaster communities have not reported any instances of the bot ignoring disallowed paths, and it always requests /robots.txt before crawling a new host.

🔍 Detection Indicators

The primary User-Agent string is Informant/1.0 (compatible; +https://informant.io/bot). Additional identifying headers include a custom X-Informant-Bot: true header and a From header containing [email protected]. Reverse DNS lookups on the IPs typically resolve to hostnames ending in .crawler.informant.io, and the bot always sends a User-Agent string that includes a link to its documentation.

📊 Data Usage

Collected content is used to train Informant’s proprietary large language models, which power features such as entity extraction, summarisation, and personalised recommendation engines. Additionally, data feeds a real-time market intelligence dashboard used by e‑commerce and media companies to track competitor trends. Crawled pages are stored for up to 90 days before being purged or aggregated into anonymised datasets that are never sold to third parties.

⚙️ Rate Limiting Policy

Rate limiting is necessary because Informant’s default crawl rate of 10 requests per second can overwhelm smaller websites or shared hosting environments. A threshold-based policy of 100 requests per minute per IP is recommended; exceeding this triggers a temporary 429 response, which the bot respects by backing off for the period specified in the Retry-After header.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

informant

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe