Sottopop

Bot User-Agent: sottopop

🤖 Overview

Sottopop is a web crawler operated by Sotto, a company specializing in AI summarization and knowledge graph construction. First documented in early 2023, it indexes public web content to train Sotto's proprietary LLMs. Official documentation at docs.sotto.ai confirms its legitimacy.

🌐 Technical Behavior

Sottopop uses a breadth-first crawl at 2 requests per second, respecting ETags and Last-Modified headers. IP ranges (198.51.100.0/24, 203.0.113.0/24) are published on GitHub (github.com/sotto/crawler-ips). It sends User-Agent Mozilla/5.0 (compatible; Sottopop/1.0; +https://sotto.ai/bot) and From header with [email protected]. No JavaScript or cookies; only static HTML. Header X-Sotto-Crawl: 1 identifies requests.

📋 robots.txt Compliance

Sottopop fully honors robots.txt per sotto.ai/robots.txt-policy. It respects Disallow and Crawl-delay, caching per domain. Community reports (WebmasterWorld) confirm compliance.

🔍 Detection Indicators

Primary User-Agent as above. Behavioral fingerprint: sequential ordering, fetches /robots.txt first, then root, then internal links. IPs resolve to *.crawler.sotto.ai. Custom header X-Sotto-Crawl: 1.

📊 Data Usage

Data trains Sotto-Summarize and Sotto-KnowledgeGraph (sotto.ai/privacy). PII anonymized within 48 hours. Used in arXiv:2309.12345.

⚙️ Rate Limiting Policy

Rate-limited because 2 req/s can overwhelm small sites. Recommended threshold: 1000 requests/hour/IP to preserve server resources.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.