anthropic-ai
Bot User-Agent:anthropic-ai
🤖 Overview
anthropic-ai is a web crawler operated by Anthropic, the AI research company that develops the Claude family of large language models. Its purpose is to collect publicly accessible web content for training and improving Anthropic’s models, focused on safety research. The crawler feeds data into Anthropic’s model training pipeline, and its behavior is documented on Anthropic’s official support website at support.anthropic.com.
🌐 Technical Behavior
The anthropic-ai crawler uses an HTTP client with the user agent string “Anthropic/1.0 (https://support.anthropic.com/en/articles/8896512-web-crawler-information)”. It also uses a secondary string “Claude-Web/1.0” for certain tasks. The crawler makes serialized requests with a default crawl delay of 10 seconds, alterable via the Crawl-Delay directive in robots.txt. All requests use HTTPS from IP ranges registered to Anthropic and published in official docs at docs.anthropic.com. The crawler does not execute JavaScript or render pages; it extracts raw HTML text, ignoring images or scripts.
📋 robots.txt Compliance
Anthropic states that the anthropic-ai crawler respects robots.txt directives, including both Disallow and Allow rules. The company provides sample robots.txt snippets using the user agent token “Anthropic/1.0” to block the crawler. Webmasters report compliance, and Anthropic’s documentation confirms the crawler also honors the X-Robots-Tag HTTP header.
🔍 Detection Indicators
The primary user agent string is “Anthropic/1.0 (https://support.anthropic.com/en/articles/8896512-web-crawler-information)”. The crawler sets standard HTTP headers like Accept: text/html,application/xhtml+xml and omits a custom referrer. Behavior includes a consistent request pattern with the user agent also appearing as “Claude-Web/1.0”. Anthropic publishes its crawler IP ranges in JSON format for integration into monitoring tools.
📊 Data Usage
Data collected by the anthropic-ai crawler is used exclusively for training Anthropic’s AI models, including pre-training corpora and fine-tuning datasets from publicly accessible web pages. Anthropic respects robots.txt opt-outs and does not use content from blocked sites. The collected text improves model accuracy, safety, and alignment with human values as part of Anthropic’s responsible AI commitment.
⚙️ Rate Limiting Policy
Rate limiting protects the anthropic-ai crawler from overloading servers; the default crawl delay of 10 seconds acts as a throttle. Webmasters may increase this delay via robots.txt, enabling legitimate AI data collection while safeguarding server performance.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.