sherlock Bot — Detection, Blocking & Technical Analysis

sherlock

Bot User-Agent: sherlock

🤖 Overview

Sherlock is an open-source automated username search tool maintained by the Sherlock Project community on GitHub. It is designed to check the existence of a given username across hundreds of social media and web platforms. The tool is widely used by security researchers, journalists, and law enforcement for legitimate OSINT (Open Source Intelligence) investigations. It is not a search engine crawler but a targeted automated agent. The project repository at github.com/sherlock-project/sherlock has over 50,000 stars and is actively developed with regular updates.

🌐 Technical Behavior

Sherlock sends concurrent HTTP GET requests to a predefined list of over 400 social networks (e.g., Twitter, GitHub, Instagram, Reddit) as defined in its YAML configuration file. By default, it uses up to 20 concurrent workers using Python's asyncio library, but this can be tuned. Requests originate from the user's public IP address with no fixed range. The default User-Agent string is "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36". It may also include an "X-Requested-With: XMLHttpRequest" header. The tool follows HTTP redirects and handles rate limiting (HTTP 429) by pausing and retrying after a configurable delay. It determines username existence primarily by HTTP status codes: 200 indicates found, 404 indicates not found.

📋 robots.txt Compliance

Sherlock does not parse or respect robots.txt restrictions because it targets specific user profile endpoints rather than crawling entire sites. Most of the queried platforms already apply their own rate limits and blocking at the application layer. The tool's official documentation advises users to respect each site's terms of service and to implement appropriate delays to avoid being blocked.

🔍 Detection Indicators

The primary indicator is a burst of rapid, sequential HTTP GET requests to multiple social media username-check URLs (e.g., "/username" on each site) from the same IP within a few seconds. The User-Agent is a generic Chrome string, but the inclusion of the "X-Requested-With: XMLHttpRequest" header is a common fingerprint. Additionally, the tool sends an "Accept-Language: en-US,en;q=0.9" header and does not request images or other resources, making the traffic pattern distinct from normal browsing.

📊 Data Usage

Data collected includes the existence status of a username on each platform, associated profile URLs, and sometimes response page titles or status codes. This data is used for OSINT analysis, digital footprint mapping, impersonation detection, and investigative reporting. Results are typically output as JSON or CSV files. The data is not used for AI training or search indexing.

⚙️ Rate Limiting Policy

Sherlock is rate-limited because its concurrent request pattern can generate dozens of requests per second from a single IP, resembling a low-grade DDoS attack. Threshold-based blocking (e.g., more than 10 requests per second to profile endpoints) is justified to protect backend infrastructure and ensure equal access for all users.

Similar Threats

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute · Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

sherlock

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Stop Bots. Save Bandwidth. Protect Revenue.

Company

Resources

Services

Trusted

Subscribe