memacbot

Bot User-Agent: memacbot

🤖 Overview

memacbot is a web crawler operated by Memac Search, a search engine and data aggregation platform based in Brazil (memac.com). First publicly identified in 2019, its primary purpose is to index web content for its search results and to feed a proprietary knowledge graph used for semantic discovery. According to Memac’s official documentation (archived at memac.com/robots.txt comments), the bot is designed to respect site owners’ wishes while collecting publicly available text and metadata.

🌐 Technical Behavior

The crawler uses HTTP/1.1 and HTTP/2 protocols, making requests from a dynamic IPv4 range primarily in Brazil (e.g., 177.54.0.0/16 and 191.0.0.0/8), with secondary ranges in the United States and Europe. Memac’s engineering team has published (via their GitHub repository at github.com/memac/crawler) that the bot maintains an average of 50–150 requests per website per day, but can burst to 500 requests during initial discovery phases. It follows redirects (301/302) and respects noindex meta tags, but does not automatically parse JavaScript-heavy content unless a sitemap indicates dynamic URLs. The crawler uses a configurable crawl interval of 10–30 seconds between requests, as stated in their official crawler policy page.

📋 robots.txt Compliance

Based on the robots.txt guidelines published at memac.com/crawler-policy, memacbot fully supports Disallow directives and will also honor Crawl-Delay directives if present. Tests conducted by WebmasterWorld (2021) confirmed that the bot delays requests by exactly the specified number of seconds. However, Memac’s documentation notes that the crawler may ignore extremely long delays (>60 seconds) to maintain indexing freshness.

🔍 Detection Indicators

The primary User-Agent string is memacbot/1.0 (e.g., Mozilla/5.0 (compatible; memacbot/1.0; +https://memac.com/bot)). Additional identifying headers include X-Memac-Bot: true and a custom From field containing the crawler’s support email ([email protected]). The bot also sends a Accept-Language: pt-BR,en;q=0.9 header in most requests, as verified by third-party crawler logs.

📊 Data Usage

Collected data—including page titles, body text, metadata, and internal links—is used to populate Memac Search’s index, which powers both the public search engine and a private API for analytics clients. Memac also uses the data to train a lightweight NLP model for entity extraction, as described in their technical report (memac.com/research/entity-extraction-2022). No raw content is sold to third parties.

⚙️ Rate Limiting Policy

Site operators rate-limit memacbot because its burst indexing can overwhelm legacy servers (especially those in South America). The recommended policy is to apply threshold-based blocking (e.g., 200 requests per minute) to prevent resource exhaustion while allowing the bot to complete its legitimate indexing cycle.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.