RocketCrawler
Crawler User-Agent:rocketcrawler
🤖 Overview
RocketCrawler is a web crawler operated by RocketReach, a sales intelligence and lead generation company headquartered in San Francisco, California. The bot systematically scans publicly accessible web pages — including company websites, social media profiles, and professional directories — to extract email addresses, phone numbers, LinkedIn URLs, and other contact data. RocketReach officially documents this crawler in its support pages and privacy policy, confirming its sole purpose is to populate RocketReach’s proprietary database of over 700 million professional profiles used by sales teams, recruiters, and marketers.
🌐 Technical Behavior
RocketCrawler performs deep, recursive crawling of domains it targets, often requesting multiple pages per second from a single origin IP. It sends standard HTTP GET requests using a configurable User-Agent string, typically without cookies or session identifiers. The crawler respects Cache-Control and ETag headers to reduce redundant downloads, but it does not support If-Modified-Since in all cases. RocketReach publishes a list of its crawler IP ranges in a dedicated spiders.txt file available at rocketreach.co/spiders.txt, which currently includes addresses from Amazon Web Services (AWS) and Google Cloud Platform (GCP). The bot obeys a default crawl delay of 1 second between requests unless a site specifies a longer delay via Crawl-Delay in robots.txt. According to RocketReach’s technical documentation, the crawler prioritizes pages with contact patterns (e.g., “/about”, “/team”, “/contact”) and skips binary files, JavaScript, and CSS.
📋 robots.txt Compliance
Official documentation from RocketReach states that RocketCrawler fully honors the robots.txt standard, including Disallow directives, Allow overrides, and custom Crawl-Delay settings. The company provides a public policy at rocketreach.co/robots-txt-compliance confirming that site owners can block the bot entirely by adding User-agent: RocketCrawler Disallow: /. Independent testing by webmasters on forums like Reddit and WebmasterWorld corroborates that the bot stops crawling restricted paths within 24 hours of an updated robots.txt file.
🔍 Detection Indicators
The primary User-Agent string is RocketCrawler/1.0 (https://rocketreach.co/contact-us), though variations such as RocketReach/1.0 or rocketcrawler may appear. Behavioral indicators include high request rates (often 5–10 requests per second), deep path traversal beyond the home page, and a tendency to request /cdn-cgi/l/email-protection Cloudflare paths. The crawler does not set any custom HTTP headers, but its IPs resolve to ASN 14618 (AWS) or ASN 15169 (GCP). Server logs frequently show sequential requests for /team, /about, /contact, and /jobs within seconds of each other.
📊 Data Usage
Collected data feeds directly into RocketReach’s sales intelligence platform, where it is enriched with social media information and used to provide verified email addresses, phone numbers, and professional titles to paying subscribers. The company states it does not scrape personal data from non-public pages or password-protected areas. According to RocketReach’s privacy policy, the aggregated data is used for B2B lead generation, market research, and recruitment outreach.
⚙️ Rate Limiting Policy
RocketCrawler is rate-limited because its aggressive crawl patterns can degrade server performance, especially on shared hosting or small websites. Threshold-based blocking (e.g., allowing 5 requests per second per IP) is justified because the crawler does not require real-time access and operates with a delay-tolerant data collection model, making lower rates acceptable without harming its effectiveness.
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.