mata hari

Bot User-Agent: mata-hari

🤖 Overview

The Mata Hari crawler is operated by Recorded Future, a leading threat intelligence company, and systematically indexes publicly accessible web content to feed the Recorded Future Intelligence Platform. Its primary purpose is to collect real-time data on cyber threats, vulnerabilities, and adversary infrastructure, enabling automated enrichment of the company's threat intelligence database. This bot is a legitimate, non-malicious agent utilized for defensive cybersecurity research.

🌐 Technical Behavior

Mata Hari employs synchronous HTTP/1.1 GET requests with a configurable crawl depth, typically following hyperlinks in a breadth-first pattern. According to Recorded Future’s official documentation, the crawler originates from IP ranges owned by the company, often listed in their support portal under Mata Hari IP Whitelist. It respects standard robots.txt crawl-delay directives and can be throttled to avoid overwhelming servers. The bot sends standard Accept and User-Agent headers and does not emulate browser behavior, making it distinguishable from headless browsers. Its request frequency peaks at a few requests per second per IP, though this varies based on site response times.

📋 robots.txt Compliance

Recorded Future explicitly states in its support documentation that Mata Hari honors Disallow directives found in robots.txt files. The bot also adheres to custom crawl-delay parameters if specified. Webmasters can block it entirely by adding User-agent: Mata Hari followed by Disallow: / to their robots.txt, and Recorded Future confirms that such restrictions are respected.

🔍 Detection Indicators

The primary identifying header is the User-Agent string: Mata Hari (https://support.recordedfuture.com/hc/en-us/articles/360000821294-Mata-Hari-Web-Crawler). Behavioral fingerprints include a consistent, low-variance inter-request interval and lack of JavaScript execution. The bot’s IP ranges are documented by Recorded Future and can be used for server-side detection and logging.

📊 Data Usage

Collected data—including article text, metadata, and site structure—is stored and processed to generate threat intelligence alerts, adversary profiles, and vulnerability timelines within the Recorded Future platform. The information is not used for general-purpose AI training but specifically for security analytics, such as identifying emerging exploit mentions or leaked credentials. Recorded Future’s AI models leverage this data to produce predictive risk scores.

⚙️ Rate Limiting Policy

Rate limiting is applied because the Mata Hari crawler, while legitimate, can generate sustained traffic that may degrade server performance for other users. A threshold-based rate limit (e.g., > 10 requests per second from a single IP) is recommended to block only abusive crawl patterns while allowing normal operation.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.