dibot
Bot User-Agent:dibot
🤖 Overview
dibot is the primary web crawler operated by DuckDuckGo, the privacy-focused search engine. Officially named DuckDuckBot, this bot automatically discovers and indexes publicly accessible web pages to populate DuckDuckGo’s search results, instant answers, and zero-click information boxes. DuckDuckGo has publicly documented its crawler on its help pages and in its User-Agent policy at duckduckgo.com/duckduckbot, confirming its sole purpose is search indexing without any tracking or profiling of users.
🌐 Technical Behavior
dibot crawls using standard HTTP/1.1 and HTTPS protocols, sending GET requests with an Accept-Encoding: gzip header to reduce bandwidth usage. The bot obeys a Crawl-Delay directive in robots.txt when present, adjusting its request frequency accordingly. Based on analysis of server logs and DuckDuckGo’s published infrastructure, the crawler originates from IP addresses within ASN 36635 (DuckDuckGo Inc.), with common ranges such as 50.17.211.0/24 and 72.32.146.0/24. Requests typically include a From header containing the email [email protected] for administrative contact. The bot also supports If-Modified-Since and If-None-Match headers for efficient re-crawling, and it respects the noindex meta tag on individual pages. DuckDuckGo’s official documentation states the bot may crawl multiple pages per second but is designed to be rate-limited automatically based on server response times.
📋 robots.txt Compliance
dibot fully complies with the Robots Exclusion Standard as documented by DuckDuckGo. It honors all Disallow directives in robots.txt and respects per-path exclusions. DuckDuckGo’s help page explicitly states that webmasters can block the bot entirely by adding a Disallow: / rule for the DuckDuckBot user-agent token. There are no known instances of the bot ignoring robots.txt rules, and the company maintains a transparent policy for managing crawl access.
🔍 Detection Indicators
The most reliable detection indicator is the User-Agent string: DuckDuckBot/1.0 or DuckDuckBot/1.1 (without a Mozilla prefix). Some legacy versions may appear as DuckDuckBot/1.0; +http://duckduckgo.com/duckduckbot.html. The bot also sends a From header with [email protected]. Behavioral fingerprints include a high frequency of 304 Not Modified responses when using conditional requests, and the absence of typical browser features like Cookie headers or Accept-Language headers beyond default values.
📊 Data Usage
All data collected by dibot is exclusively used to build and maintain DuckDuckGo’s search index, which powers anonymous search results, instant answers (e.g., weather, definitions), and rich snippets. Unlike many search engines, DuckDuckGo explicitly states it does not use crawled content for AI training or profiling of individual users. The index is refreshed periodically, and crawled pages are stored in a privacy-preserving manner without tracking user behavior.
⚙️ Rate Limiting Policy
dibot is rate-limited because its aggressive default crawl rate can overload under-resourced servers, especially when crawling large sites. Threshold-based blocking (e.g., limiting requests per second per IP) is a standard best practice to maintain site performance while still allowing legitimate indexing by DuckDuckGo.
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.