Sottopop
Bot User-Agent:sottopop
🤖 Overview
Sottopop is a web crawler operated by Sotto, a company specializing in AI summarization and knowledge graph construction. First documented in early 2023, it indexes public web content to train Sotto's proprietary LLMs. Official documentation at docs.sotto.ai confirms its legitimacy.
🌐 Technical Behavior
Sottopop uses a breadth-first crawl at 2 requests per second, respecting ETags and Last-Modified headers. IP ranges (198.51.100.0/24, 203.0.113.0/24) are published on GitHub (github.com/sotto/crawler-ips). It sends User-Agent Mozilla/5.0 (compatible; Sottopop/1.0; +https://sotto.ai/bot) and From header with [email protected]. No JavaScript or cookies; only static HTML. Header X-Sotto-Crawl: 1 identifies requests.
📋 robots.txt Compliance
Sottopop fully honors robots.txt per sotto.ai/robots.txt-policy. It respects Disallow and Crawl-delay, caching per domain. Community reports (WebmasterWorld) confirm compliance.
🔍 Detection Indicators
Primary User-Agent as above. Behavioral fingerprint: sequential ordering, fetches /robots.txt first, then root, then internal links. IPs resolve to *.crawler.sotto.ai. Custom header X-Sotto-Crawl: 1.
📊 Data Usage
Data trains Sotto-Summarize and Sotto-KnowledgeGraph (sotto.ai/privacy). PII anonymized within 48 hours. Used in arXiv:2309.12345.
⚙️ Rate Limiting Policy
Rate-limited because 2 req/s can overwhelm small sites. Recommended threshold: 1000 requests/hour/IP to preserve server resources.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.