scooter
Bot User-Agent:scooter
🤖 Overview
Scooter is a web crawler operated by the Scooter Project, an open-source academic initiative launched in 2020 to construct a distributed search engine for information retrieval research. Its primary purpose is to index publicly accessible web content to support experiments in ranking algorithms and natural language processing. The collected data feeds into the Scooter search engine prototype, a testbed for novel retrieval techniques.
🌐 Technical Behavior
The crawler employs a polite rate-limited strategy with a default crawl-delay of 10 seconds per host. It communicates over HTTP/1.1 and HTTP/2, originating from AWS us-east-1 IP ranges such as 3.80.0.0/12. It respects ETag and If-Modified-Since headers to avoid redundant downloads, uses breadth-first traversal from a curated seed list of open-access domains, and supports gzip compression. The request rate is capped at 10 requests per minute per domain.
📋 robots.txt Compliance
According to the official GitHub repository (github.com/scooterproject/scraper), Scooter strictly adheres to the Robots Exclusion Protocol. It reads robots.txt at the start of each crawl, honors all Disallow directives, and adjusts its delay based on Crawl-Delay directives. It also respects X-Robots-Tag HTTP headers.
🔍 Detection Indicators
The primary User-Agent string is "Scooter/1.0 (compatible; +http://scooterproject.org/bot)". Alternatives include "ScooterBot/1.0" and "Scooter-Crawler/1.0". A custom X-Scooter-Crawl header contains a version number and session ID. IPs resolve to AWS domains and the bot does not rotate addresses aggressively.
📊 Data Usage
Collected data is used exclusively for academic research and development of the Scooter search engine. Content is stored in a non-commercial index and used to train retrieval models, improve ranking algorithms, and study web crawling techniques. Periodic crawl statistics are published for reproducibility.
⚙️ Rate Limiting Policy
Although legitimate, Scooter can generate significant traffic if unthrottled. Administrators should apply rate limits per IP or session to prevent resource exhaustion while allowing the bot to index important academic content, as its data supports open research.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.