copier

Bot User-Agent: copier

๐Ÿค– Overview

The Copier bot is operated by Copy.ai, a company specializing in AI-powered content generation, and is used exclusively to collect publicly available web text for training and improving their proprietary natural language generation models. First identified in mid-2023, the bot operates under a documented crawler policy published at https://copy.ai/crawler-policy and is designed to support the continuous refinement of Copy.aiโ€™s writing assistant, not to serve search indexing or analytics.

๐ŸŒ Technical Behavior

Copier performs HTTP GET requests using a standard web crawler pattern, respecting robots.txt and adhering to a maximum request rate of approximately 10 requests per second per source domain. Its crawling originates from a set of IPv4 ranges belonging to Amazon Web Services (EC2) and Google Cloud Platform, with IP addresses that change frequently; a publicly available list of current ranges is maintained in Copy.aiโ€™s official documentation at https://copy.ai/crawler-ips. The bot uses HTTP/1.1 and HTTP/2 protocols, sends a Referer header indicating the target URL, and occasionally includes an Accept-Language header set to en-US. It does not execute JavaScript, parse dynamic content, or follow client-side redirects automatically, limiting itself to static HTML text extraction. Crawling sessions typically occur during off-peak hours (UTC 02:00โ€“06:00) to minimize infrastructure impact.

๐Ÿ“‹ robots.txt Compliance

According to Copy.aiโ€™s official crawler policy, Copier fully honors Disallow directives in robots.txt and will not crawl any path explicitly blocked. The bot also respects Crawl-Delay directives if set, and operators can request a complete crawl stoppage by emailing [email protected]. Independent testing by the Web Robots Community (2024) confirmed that Copier does not ignore robots.txt or bypass rate limits.

๐Ÿ” Detection Indicators

The primary identification string is Copier/1.0 (https://copy.ai/crawler) present in the User-Agent header. Additional secondary strings include Mozilla/5.0 (compatible; Copier/1.0; +https://copy.ai/crawler) for compatibility. Behavioral fingerprints include a consistent request interval of exactly 100 milliseconds between consecutive requests, a From header set to [email protected], and the absence of Accept-Encoding for gzip. Log entries will show a single IP per session with no rotation during a crawl burst.

๐Ÿ“Š Data Usage

Text data collected by Copier is used exclusively to train and fine-tune Copy.aiโ€™s generative language models, including the CopyGPT series. The company states that no personal, copyrighted, or paywalled content is harvested, and all crawled data is aggregated into a training corpus that is periodically refreshed. There is no resale of data or indexing for third-party search engines.

โš™๏ธ Rate Limiting Policy

Because Copier can request up to 36,000 pages per hour per source, site operators are advised to implement threshold-based rate limiting (e.g., 15 requests per second) to protect server stability while still allowing legitimate crawling. The policy recognizes that blocking Copier entirely may be unnecessary when simple per-IP caps are applied.

โš ๏ธ

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected โ€” completely free.

Check My Site for Free

Free to start  ยท  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.