myengines-us-bot

Bot User-Agent: myengines-us-bot

🤖 Overview

myengines-us-bot is a web crawler operated by MyEngines Inc., a US-based search engine aggregator that combines results from multiple sources into a unified search interface. First documented in 2019, the bot's primary purpose is to index publicly accessible web pages to populate the MyEngines search index, which serves users through the main website and a browser extension. According to the official MyEngines crawler documentation at myengines.com/crawler, the bot is designed to operate with low latency and minimal server impact.

🌐 Technical Behavior

The crawler makes requests over both HTTP/1.1 and HTTP/2 protocols, typically sending a User-Agent string with a version identifier such as "myengines-us-bot/1.0". It performs a mix of full page retrievals (GET) and lightweight checks (HEAD) to validate URLs. The request frequency is dynamically throttled based on server response times, with a default maximum of 5 requests per second per host, as specified in their crawling policy. IP ranges are allocated from the MyEngines-owned ASN (AS206264) and may include blocks from cloud providers like AWS (us-east-1 region). The bot respects the Crawl-Delay directive in robots.txt and will back off when encountering 429 or 503 status codes.

📋 robots.txt Compliance

MyEngines publicly states that myengines-us-bot fully adheres to the Robots Exclusion Standard, including Disallow directives and Crawl-Delay rules. Their official policy page (myengines.com/robots) explicitly instructs webmasters on how to manage the bot's access, and the bot has been observed honoring per-path restrictions in production environments.

🔍 Detection Indicators

The primary User-Agent string is "myengines-us-bot/1.0" (sometimes "MyEngines/1.0" in older versions). A secondary identifying header From is often set to "[email protected]". The bot may also send a X-MyEngines-Crawl header with a value of "true". Behavioral fingerprints include a consistent gap of at least 200 milliseconds between consecutive requests to the same domain and a preference for crawling sitemap.xml files first.

📊 Data Usage

Collected content is exclusively used to build and maintain the MyEngines search index, which aggregates results from its own crawl alongside data from partner engines. The company explicitly states that no collected data is used for AI training, model development, or any behavioral profiling of users or site visitors. All data is stored temporarily for indexing and is refreshed based on crawl schedules defined in sitemaps.

⚙️ Rate Limiting Policy

While the bot is legitimate and well-behaved, it is rate-limited by security systems because it can still generate a high volume of requests when crawling large sites, potentially degrading performance for other users. Threshold-based blocking is a prudent defense to enforce the documented crawl-delay limits and prevent unintended server overload.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.