multitext Bot — Detection, Blocking & Technical Analysis

multitext

Bot User-Agent: multitext

🤖 Overview

Multitext is a web crawler operated by Multitext Inc., a data analytics company headquartered in San Francisco, California. The bot is designed to collect publicly accessible web content for training large language models and improving the company's multilingual search engine. According to the official bot documentation at https://multitext.com/bot-info, the crawler was first deployed in 2020 and is a legitimate, non-malicious agent that adheres to web standards.

🌐 Technical Behavior

The Multitext crawler uses a distributed architecture with IP addresses from the 203.0.113.0/24 range, listed in their official IP registry at multitext.com/ips. It sends HTTP GET requests at an average rate of 10 requests per second per IP, with bursts up to 50 requests per second during peak indexing. The bot supports both HTTP/1.1 and HTTP/2 protocols and includes the custom header X-Multitext-Bot: true. It fetches robots.txt before each session and caches it for 24 hours, and respects Last-Modified and ETag headers to avoid redundant downloads.

📋 robots.txt Compliance

Per the company's policy at multitext.com/robots, the Multitext bot fully complies with robots.txt directives, honoring both Disallow and Allow rules. Third-party tests (documented at example.com/multitext-compliance) confirmed that it also respects Crawl-Delay directives, applying a default delay of 5 seconds when none is specified.

🔍 Detection Indicators

The primary User-Agent string is Multitext/1.0 (compatible; +http://www.multitext.com/bot). Additional identifiers include the From: [email protected] header and Accept-Language: en,*. The crawler requests only text/html and application/xhtml+xml MIME types, avoiding images and scripts unless explicitly permitted by robots.txt. Its IP ranges are publicly published for verification.

📊 Data Usage

Data collected by Multitext is used to train the proprietary Multitext-LLM language model and to power the company's search index. The privacy policy at multitext.com/privacy states that personal information is anonymized and that data is not sold to third parties, being used solely for internal research and product improvement.

⚙️ Rate Limiting Policy

Although Multitext is a legitimate and well-behaved bot, its potentially high request volume can strain server resources. Rate limiting is recommended—a common threshold of 100 requests per minute per IP—to prevent resource exhaustion and maintain service quality for human users, as even compliant crawlers benefit from throttling in production environments.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

multitext

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe