bdfetch Bot — Detection, Blocking & Technical Analysis

bdfetch

Bot User-Agent: bdfetch

🤖 Overview

bdfetch is a legitimate web crawler operated by Baidu, Inc., the Chinese search engine company, as part of their Baiduspider family of automated agents. Its primary purpose is to fetch and index web content for Baidu’s search engine, ensuring that Baidu’s search results remain comprehensive and up‑to‑date. According to Baidu’s official webmaster documentation (available at https://ziyuan.baidu.com/), bdfetch is used specifically for large‑scale page retrieval tasks and feeds data into Baidu’s main search index as well as subsidiary products such as Baidu News and Baidu Baike.

🌐 Technical Behavior

The bdfetch crawler typically sends HTTP GET requests at a moderate but steady rate, often ranging from a few requests per second to several hundred per minute depending on the site’s server response times. It uses standard HTTP/1.1 and HTTP/2 protocols and honors the Accept-Language and Accept-Encoding headers. Its originating IP addresses belong to Baidu’s owned ASNs, primarily AS4134 (ChinaNet) and AS55902 (Baidu), with a documented range that includes prefixes such as 220.181.108.0/24 and 111.13.101.0/24. The crawler follows links recursively but respects noindex meta tags and nofollow attributes on hyperlinks, as confirmed by Baidu’s webmaster guidelines (see https://ziyuan.baidu.com/wiki/166). It also caches DNS records aggressively and may re‑crawl pages with a frequency determined by a site’s update rate and PageRank signals.

📋 robots.txt Compliance

Based on Baidu’s published documentation and real‑world testing, bdfetch fully supports the robots.txt exclusion standard. It reads the file at startup and obeys both Disallow and Allow directives for paths and user‑agent tokens. The bot identifies itself using the user‑agent token Baiduspider in robots.txt, but individual fetch sub‑agents such as bdfetch may also be targeted with a separate line like User‑agent: bdfetch. Baidu recommends webmasters use the Baiduspider token to manage all of their crawlers’ access.

🔍 Detection Indicators

The primary detection indicator is the User‑Agent string, which typically appears as Mozilla/5.0 (compatible; bdfetch; +http://www.baidu.com/search/spider.html). Additional behavioral fingerprints include a Referer header that often contains the source page URL and a tendency to avoid requesting JavaScript or CSS resources. The bot also sends a custom From header in some cases, though this is not guaranteed. Reverse DNS lookups on its IPs usually resolve to hostnames ending in .baidu.com or .baidu.jp (for international crawls).

📊 Data Usage

Data collected by bdfetch is used exclusively for building and updating Baidu Search’s index. The crawled text, metadata, and content structure are processed into searchable tokens that power Baidu’s organic search results, news aggregation, and knowledge graph entries. Baidu states in its privacy policy that fetched data is not used for commercial purposes beyond search functionality and is not shared with third parties.

⚙️ Rate Limiting Policy

Because bdfetch can generate significant traffic—especially on high‑authority sites—webmasters often implement rate‑limiting to protect server resources. Limiting bdfetch to a threshold of, for example, 10 requests per second per IP is a reasonable policy, as Baidu’s crawler is designed to back off when it receives HTTP 429 status codes and will retry after the specified Retry-After period, aligning with best practices for legitimate crawlers.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

bdfetch

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe