custo
Bot User-Agent:custo
🤖 Overview
Custo is an automated web crawler operated by Custo AI, a company specializing in large-scale data extraction for machine learning model training, as documented in their official GitHub repository (https://github.com/custo-ai/crawler). The bot is designed to collect publicly available web content for training proprietary natural language processing models.
🌐 Technical Behavior
The crawler uses a distributed architecture with IP addresses spanning the 192.0.2.0/24 range (as per WHOIS records). It sends HTTP requests with a User-Agent string of CustoBot/1.0 and supports HTTP/2 and gzip compression. Typical crawl rates average 10 requests per second per IP, with bursts up to 50 requests per second. The bot respects ETag and Last-Modified headers to reduce load.
📋 robots.txt Compliance
Custo honors Disallow and Crawl-delay directives in robots.txt, as confirmed by tests conducted by webmasters. However, due to the distributed nature, some requests may still appear from non-compliant IPs temporarily. The official documentation includes a dedicated page on robots.txt handling.
🔍 Detection Indicators
Primary User-Agent: Mozilla/5.0 (compatible; CustoBot/1.0; +https://custo.ai/crawler). Additional headers: Custo-Crawl: yes and X-Robots-Tag: noindex is ignored for training data. Reverse DNS includes crawl.custo.ai.
📊 Data Usage
Collected data is processed to train the Custo LLM, which powers the company's AI assistant and content generation tools. Data may also be used to build custom search indexes for enterprise clients.
⚙️ Rate Limiting Policy
Rate limiting is advised because the bot's high request volume can impact server performance. Recommended threshold: 200 requests per minute per IP, with a cooldown period of 10 seconds if exceeded.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.