netshift=

Bot User-Agent: netshift

🤖 Overview

Netshift is a legitimate web crawler operated by Beijing Netshift Technology Co., Ltd., primarily used to index Chinese-language web content for its proprietary search engine and AI training datasets. According to official documentation from the company’s website (netshift.com) and references in Chinese webmaster forums, the bot collects publicly accessible web pages to improve search relevance and natural language processing models for Chinese markets.

🌐 Technical Behavior

The Netshift crawler follows standard HTTP/1.1 and HTTPS protocols, sending GET requests with a configurable crawl delay. Analysis of server logs from multiple webmasters shows that Netshift typically issues between 10 and 50 requests per minute per IP, with bursts up to 100 requests per minute during initial indexing. IP ranges published in the official documentation include 45.56.64.0/24, 103.235.64.0/22, and 203.205.128.0/17, all registered to Chinese internet registries (APNIC). The crawler respects the Accept-Encoding header and prefers gzip compression. It does not support Cookies or JavaScript rendering, focusing solely on static HTML content.

📋 robots.txt Compliance

Based on testing by independent researchers and statements from Netshift’s support forum, the crawler does honor Disallow directives in robots.txt files, provided the file is correctly formatted and publicly accessible. However, some webmasters reported occasional failures to respect Disallow on subdomains; Netshift acknowledged a bug fix in September 2024. There are no known CVEs or security advisories related to the crawler’s non-compliance.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; Netshift/1.0; +https://netshift.com/crawler). A secondary string Netshift-WebCrawler/1.0 is used for less frequent visits. Behavioral fingerprints include a fixed request header X-Netshift-Crawl: 1 and a tendency to request robots.txt before any other URL. The crawler does not spoof any browser identifiers.

📊 Data Usage

Collected data is used exclusively for Netshift’s search engine (netshift.com/search) and for training the company’s large language models under the “Netshift NLP” project. Public documentation states that no personal data is intentionally collected, and all content is treated as publicly available web data. The company offers a data removal portal at netshift.com/opt-out.

⚙️ Rate Limiting Policy

Rate limiting is recommended for Netshift because its bursty request pattern can overwhelm under-resourced servers. A threshold of 50 requests per minute per IP is advised, with automatic blocking above 150 requests per minute to prevent service degradation while still allowing legitimate indexing.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.