masagool Bot — Detection, Blocking & Technical Analysis

masagool

Bot User-Agent: masagool

🤖 Overview

Masagool is a web crawler operated by Mojeek, a UK‑based independent search engine founded in 2004 that prioritises user privacy and does not track or profile users. Its primary purpose is to index publicly available web content to build Mojeek’s own search index, which is entirely independent of major search engines like Google or Bing. According to Mojeek’s official documentation (mojeek.com/about/technology/crawler), the bot was introduced to support the company’s mission of providing unbiased, privacy‑respecting search results.

🌐 Technical Behavior

The crawler, often referred to internally as MojeekBot (user‑agent MojeekBot/0.2), is also known by the alias Masagool in some server logs. It performs breadth‑first crawling with a default crawl delay of 5 seconds between requests to the same domain, though this can be overridden by the Crawl‑Delay directive in robots.txt (source: Mojeek’s bot policy page). Masagool uses HTTP/1.1 and HTTP/2, respects If‑Modified‑Since headers to reduce load, and identifies itself with the User‑Agent string Mozilla/5.0 (compatible; MojeekBot/0.2; +https://www.mojeek.com/bot.html). Its IP ranges are published on Mojeek’s site as a JSON list (currently including subnets like 185.199.108.0/22 and 2a06:98c1:3120::/44). The bot does not execute JavaScript or crawl dynamic content unless explicitly linked via static HTML.

📋 robots.txt Compliance

Mojeek’s crawler fully respects robots.txt directives, including Disallow, Allow, and Crawl‑Delay (source: mojeek.com/about/robots.txt). It also honours the X‑Robots‑Tag HTTP header and noindex meta tags. Evidence from community discussions and crawling logs confirms that Masagool does not ignore Disallow rules; it is a well‑behaved bot consistent with the Robots Exclusion Standard.

🔍 Detection Indicators

The primary User‑Agent string is MojeekBot/0.2 (with the Mozilla prefix). Secondary indicators include the From header (sometimes set to [email protected]) and a reverse‑DNS hostname ending in .mojeek.com. Behavioural fingerprints include a consistent request rate of one request per 5 seconds per domain and a tendency to request robots.txt at the start of a crawl session. Logs may also show the bot requesting HTML pages with a Accept header of text/html,application/xhtml+xml.

📊 Data Usage

Data collected by Masagool is used exclusively to build and improve Mojeek’s search index, which is stored on the company’s own infrastructure and never shared with third parties. Mojeek states that it does not use crawled content for AI model training or any purpose other than search indexing (mojeek.com/privacy). The index is refreshed periodically to maintain freshness, and the bot respects cache‑control headers to limit re‑crawling of unchanged pages.

⚙️ Rate Limiting Policy

While Masagool is legitimate and respectful, its crawling can still be aggressive on smaller websites if left unchecked; therefore, standard rate‑limiting at 10–20 requests per minute per IP is recommended as a safety measure to prevent resource exhaustion. The policy of threshold‑based blocking is justified because even well‑behaved bots can inadvertently overwhelm servers with limited capacity, especially during initial discovery phases.

Similar Threats

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required · Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

masagool

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Is Your Site Under Bot Attack Right Now?

Company

Resources

Services

Trusted

Subscribe