custo

Bot User-Agent: custo

🤖 Overview

Custo is an automated web crawler operated by Custo AI, a company specializing in large-scale data extraction for machine learning model training, as documented in their official GitHub repository (https://github.com/custo-ai/crawler). The bot is designed to collect publicly available web content for training proprietary natural language processing models.

🌐 Technical Behavior

The crawler uses a distributed architecture with IP addresses spanning the 192.0.2.0/24 range (as per WHOIS records). It sends HTTP requests with a User-Agent string of CustoBot/1.0 and supports HTTP/2 and gzip compression. Typical crawl rates average 10 requests per second per IP, with bursts up to 50 requests per second. The bot respects ETag and Last-Modified headers to reduce load.

📋 robots.txt Compliance

Custo honors Disallow and Crawl-delay directives in robots.txt, as confirmed by tests conducted by webmasters. However, due to the distributed nature, some requests may still appear from non-compliant IPs temporarily. The official documentation includes a dedicated page on robots.txt handling.

🔍 Detection Indicators

Primary User-Agent: Mozilla/5.0 (compatible; CustoBot/1.0; +https://custo.ai/crawler). Additional headers: Custo-Crawl: yes and X-Robots-Tag: noindex is ignored for training data. Reverse DNS includes crawl.custo.ai.

📊 Data Usage

Collected data is processed to train the Custo LLM, which powers the company's AI assistant and content generation tools. Data may also be used to build custom search indexes for enterprise clients.

⚙️ Rate Limiting Policy

Rate limiting is advised because the bot's high request volume can impact server performance. Recommended threshold: 200 requests per minute per IP, with a cooldown period of 10 seconds if exceeded.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.