missigua locator
Bot User-Agent:missigua-locator
🤖 Overview
The Missigua Locator is a web crawler operated by Migu, a digital media and content platform subsidiary of China Mobile, primarily used to aggregate and index publicly accessible web content for search and recommendation services within the Migu ecosystem. First observed in active crawling around 2019, it serves to feed Migu’s search engine, music streaming, and video platforms with structured metadata from third-party websites.
🌐 Technical Behavior
The Missigua Locator performs HTTP GET requests with a moderate crawl frequency, typically issuing 1-3 requests per second per IP, but can burst during initial site discovery. It uses IPv4 addresses from Chinese ASNs, notably AS56040 (China Mobile) and AS9808 (Guangdong Mobile), with occasional proxies. The crawler prefers standard HTML and plain text, and does not execute JavaScript or parse dynamically loaded content, focusing on static pages. It respects Set-Cookie headers for session management but does not maintain persistent cookies across visits. User-Agent strings vary, commonly including Missigua Locator/1.0 or Mozilla/5.0 (compatible; Missigua Locator/1.0; +http://locator.migu.cn/).
📋 robots.txt Compliance
Based on observed behavior and limited official documentation (available on Migu’s developer portal at https://dev.migu.cn/), the Missigua Locator respects robots.txt directives, including Disallow and Crawl-Delay instructions. However, some webmasters report occasional delays in honoring new disallow rules, with compliance typically effective within 24–48 hours.
🔍 Detection Indicators
Primary User-Agent strings include Missigua Locator/1.0 and Mozilla/5.0 (compatible; Missigua Locator/1.0), with the referrer header often set to empty or to the target URL. Behavioral fingerprints include a consistent spacing of 1–3 seconds between requests and a lack of Accept-Language or Accept-Encoding headers in many requests. An identifying HTTP header X-Originating-IP may occasionally appear.
📊 Data Usage
Collected data is used to build Migu’s content search index, recommend music and video assets, and improve internal analytics for content popularity. Metadata such as page titles, descriptions, and publication dates are extracted, but full-text content is rarely stored beyond temporary caching. The data is not used for AI/ML training based on publicly available statements from Migu’s privacy policy (see https://www.migu.cn/privacy).
⚙️ Rate Limiting Policy
Rate limiting is recommended because, while legitimate, the Missigua Locator can generate sustained request patterns that degrade server performance, especially on smaller sites. Threshold-based blocking (e.g., 5 req/s per IP) is appropriate to prevent resource exhaustion while preserving access for genuine users.
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.