metagloss
Bot User-Agent:metagloss
🤖 Overview
Metagloss is a web crawler operated by Metagloss Inc., a company that maintains a centralized repository of specialized glossaries, terminology databases, and domain-specific definition catalogs. Its primary purpose is to index publicly accessible glossary pages, technical documentation, and educational resources to feed into the Metagloss API and public website, which serve researchers, translators, and content managers seeking precise term definitions.
🌐 Technical Behavior
The crawler follows a breadth-first crawl strategy with a default delay of 3 seconds between requests, though it may burst up to 50 requests per minute on large sites. It uses HTTP/1.1 with persistent connections and respects the Cache-Control header to avoid stale content. The bot typically sources its IP ranges from a pool managed by Amazon Web Services (AWS) (ASN 16509) and occasionally from Google Cloud (ASN 15169), though the exact ranges are not publicly listed. It sends requests with the Accept-Encoding: gzip header and a custom User-Agent that includes a contact email for feedback. The crawl depth is limited to 10 hops by default, and it avoids binary file types like PDF and images unless explicitly included in the page markup.
📋 robots.txt Compliance
According to the official Metagloss bot policy published at metagloss.com/bot-policy, the crawler fully honors Disallow directives in robots.txt and will not access any path or resource explicitly forbidden. It also supports the Crawl-Delay directive, adjusting its request interval to the specified number of seconds. There are no known violations or documented incidents of ignoring such rules.
🔍 Detection Indicators
The primary User-Agent string is Metagloss/1.0 (https://metagloss.com; [email protected]), with an alternative string MetaglossBot/1.0 observed in older logs. Behavioral fingerprints include sequential request patterns without randomized delays, and a consistently applied From email header. Some administrators also report a missing Referer header and an unusually low request-level variation in HTTP user-agent version numbers.
📊 Data Usage
Collected glossary terms, definitions, and contextual usage examples are indexed, normalized, and cross-referenced within the Metagloss platform. The data powers the company’s AI‑assisted translation tool, API‑based term lookup for enterprise clients, and a public-facing search engine that returns curated definitions from multiple authoritative sources. No user‑generated content is permanently stored beyond the snippet required for term extraction.
⚙️ Rate Limiting Policy
Although Metaglass is a legitimate agent, it is rate‑limited to prevent excessive bandwidth consumption on shared web servers. A threshold of 10 requests per 30 seconds is recommended; exceeding this triggers a temporary block that still allows the bot to resume after a 60‑second cooldown, ensuring fair use for all crawlers.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.