mj12bot Bot — Detection, Blocking & Technical Analysis

mj12bot

Bot User-Agent: mj12bot

🤖 Overview

MJ12bot is a web crawler operated by Majestic (formerly Majestic-12 Ltd.), a UK‑based SEO company founded in 2004 and headquartered in London. Its primary purpose is to build and maintain the world’s largest commercial link‑intelligence database, known as the Majestic Site Explorer and the Majestic Fresh Index. This bot systematically discovers and recrawls billions of URLs to map the hyperlink structure of the public web, feeding data into Majestic’s proprietary Link Graph, Trust Flow, and Citation Flow metrics used by SEO professionals, marketers, and academic researchers. The crawler has been active since at least 2006 and is documented on Majestic’s official website (majestic.com/crawler) and in their developer documentation.

🌐 Technical Behavior

MJ12bot performs both deep and wide crawling, following links from seed lists and sitemaps. It sends requests with a default crawl delay that varies; historically it could be aggressive, but since 2018 Majestic has implemented per‑site rate‑limiting based on server response times. The bot uses HTTP/1.1 and supports both IPv4 and IPv6, with IP addresses that often resolve to ranges owned by Majestic’s hosting providers (e.g., 80.94.76.x, 81.24.106.x, and 212.227.x.x). It fetches robots.txt before every crawl session and respects Crawl-Delay directives. The crawler does not execute JavaScript, does not submit forms, and only follows static links and redirects. It also supports conditional GET requests with If-Modified-Since and ETag headers to reduce bandwidth consumption. Majestic publishes a list of its current IP ranges on their website and in their API documentation (docs.majestic.com/api).

📋 robots.txt Compliance

MJ12bot fully honors robots.txt exclusions. Majestic’s official documentation states that the crawler adheres to the Robots Exclusion Protocol, including the Disallow directive and the Crawl-Delay setting. Many site operators have observed that the bot will stop crawling paths explicitly disallowed. However, because it is a link‑indexing crawler, it may still revisit the root URL even if the entire site is disallowed, but it will not follow any links from that root. Verified reports on webmaster forums (e.g., WebmasterWorld) and the Majestic support pages confirm compliance.

🔍 Detection Indicators

The primary User‑Agent string is MJ12bot/1.0 (often seen as Mozilla/5.0 (compatible; MJ12bot/1.0; +http://majestic12.co.uk/bot.php)). Some older variants may include MJ12bot/v1.4.5 or similar. The bot also sends a From header with the address [email protected]. A secondary identifier is the request pattern: it typically fetches robots.txt first, then the root page, then follows links in a breadth‑first manner, often with 15–30 seconds between requests unless a site’s Crawl-Delay is lower. Additionally, Majestic provides a bot verification tool where site owners can confirm whether a specific IP belongs to MJ12bot by checking their API (api.majestic.com/api/command?app=CheckBot&ip=x.x.x.x).

📊 Data Usage

Collected data is used exclusively for Majestic’s commercial SEO analytics platform. The bot’s crawl data populates the Majestic Fresh Index (updated daily) and the Historic Index (archives backlinks for years). This information is sold to subscribers via the Majestic Site Explorer, which provides backlink profiles, anchor text analysis, trust metrics, and link graph visualizations. Majestic also uses the data for academic research partnerships and for internal product development. No personal or sensitive content is deliberately extracted; only publicly accessible page content and link structure are recorded. The company explicitly states that data is not used for AI model training beyond link analysis algorithms.

⚙️ Rate Limiting Policy

Site owners should rate‑limit MJ12bot because its historical default crawl speed can overwhelm smaller servers if left unchecked, even though Majestic now implements adaptive throttling. Applying a threshold—such as limiting to 5 requests per minute or using Crawl-Delay: 10 in robots.txt—ensures server stability while still allowing the bot to index the site for SEO visibility, which is the rationale behind a balanced rate‑limiting policy.

Similar Threats

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

mj12bot

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

53% of Web Traffic Is Bots in 2026

Company

Resources

Services

Trusted

Subscribe