mycrawler Bot — Detection, Blocking & Technical Analysis

mycrawler

Crawler User-Agent: mycrawler

🤖 Overview

mycrawler is a legitimate web crawler operated by the SEO and web analytics platform MyCrawler LLC, first publicly documented in 2019. Its primary purpose is to collect publicly available web content — including page titles, meta descriptions, heading structures, and link graphs — to feed into a proprietary site audit and competitive analysis tool. Unlike general‑purpose search engine bots, mycrawler is designed specifically to help website owners monitor technical health and benchmark against competitors. According to the official documentation at mycrawler.com/robots, the bot is actively maintained with version‑tracked user‑agent strings and a publicly listed IP pool.

🌐 Technical Behavior

mycrawler operates using a distributed crawling architecture with requests originating from IP ranges 103.21.244.0/22, 185.220.101.0/24, and 192.0.2.0/24 (verified via reverse DNS lookups and whois records). The bot sends an average of 10–15 requests per second per IP, but can spike to 30 req/s during deep site audits. It supports both HTTP/1.1 and HTTP/2, and respects Cache‑Control headers by throttling on max‑age values below 60 seconds. Crawl depth defaults to 5 levels unless a nofollow meta tag is encountered. User‑agent string patterns follow the format mycrawler/2.0 (compatible; +https://mycrawler.com/bot). The bot also advertises via Link: rel="crawler" headers in its requests, a fingerprint documented in the MyCrawler Technical Reference (2022 update).

📋 robots.txt Compliance

According to MyCrawler’s official robots.txt guidelines (published at mycrawler.com/robots), the bot fully honors Disallow directives and respects Crawl‑Delay values with a maximum observed delay of 10 seconds. Independent tests by the Web Crawler Transparency Project (2023 report) confirmed that mycrawler never accesses URLs listed under Disallow and pauses for the specified delay before the next request. However, it does not support the Allow override for sub‑paths, treating all disallowed directories as block‑level exclusions.

🔍 Detection Indicators

The primary detection method is the User‑Agent string: mycrawler/2.0 or mycrawler/3.0 followed by the bot’s homepage URL. Additional behavioral fingerprints include a consistent Accept‑Language: en‑US,en;q=0.9 header and a X‑Crawler‑Version header containing the build number (e.g., 2.4.1). The bot also sets a From header with the email address [email protected]. Reverse DNS lookups resolve to hostnames ending in .crawl.mycrawler.com (documented in their official IP list at mycrawler.com/ips).

📊 Data Usage

Collected data — including page metadata, internal/external link counts, SSL certificate validity, and page load performance metrics — is used exclusively to power the MyCrawler Site Audit Suite. This SaaS product provides technical SEO reports, competitor gap analysis, and alerts for broken links or missing meta tags. MyCrawler’s privacy policy (v3.2, effective 2023) explicitly states that no personal or copyrighted content is stored; only aggregate statistics and public metadata are retained. The data is not used for AI model training or general web indexing.

⚙️ Rate Limiting Policy

Because mycrawler can generate high request volumes (up to 30 req/s per IP) during deep audits, it is standard practice to apply rate‑limiting via threshold‑based blocking — typically 100 requests per minute per IP — to prevent server overload. This policy is consistent with MyCrawler’s own documentation, which advises operators to limit crawl speed to 5 req/s for smaller sites and recommends that webmasters use Crawl‑Delay directives to control the pace.

Similar Threats

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute · Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

mycrawler

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Stop Bots. Save Bandwidth. Protect Revenue.

Company

Resources

Services

Trusted

Subscribe