coccoc
Bot User-Agent:coccoc
🤖 Overview
Coc Coc Bot is operated by the Vietnamese technology company Coc Coc, which runs one of the most widely used search engines in Vietnam since its launch in 2013. The bot gathers publicly available web content to populate the Coc Coc search index, which serves over 25 million monthly active users primarily in Vietnam. Its official documentation states the crawler exists solely to improve search relevance and provide localized results for Vietnamese-language queries.
🌐 Technical Behavior
The Coc Coc bot employs a custom crawler written in C++ and Python, with a reported crawl frequency that averages one request per 5–10 seconds per domain, though bursts of up to 20 requests per minute have been observed during initial indexing. It primarily uses HTTP/1.1 and HTTPS over IPv4, with IP ranges that originate from Vietnamese ISPs such as VNPT and Viettel, as well as some ranges from Singapore and the United States based on available netblocks (e.g., 103.199.160.0/24, 27.78.0.0/16). The bot respects the If-Modified-Since header and the Last-Modified response to avoid re-crawling unchanged content. According to Coc Coc’s official webmaster guidelines, the crawler can handle JavaScript rendering through a headless Chromium instance for modern single-page applications, but it prioritizes static HTML for efficiency.
📋 robots.txt Compliance
Coc Coc Bot fully honors Disallow directives in robots.txt as documented on the Coc Coc for Webmasters portal. The user-agent string used in robots.txt is coccoc (case-insensitive). Empirical tests by third-party SEO analysts confirm that the bot stops crawling any URL path or directory explicitly blocked, and it also respects Crawl-Delay directives when set to a minimum of 1 second. Failure to comply would risk removal from the Coc Coc index per their published policies.
🔍 Detection Indicators
The bot is identified by the User-Agent string: Mozilla/5.0 (compatible; coccocbot/1.0; +http://help.coccoc.com/en/webmaster/crawler/1.0/) — note that “coccocbot” is the exact token used for HTTP headers. Additional fingerprinting includes the absence of common browser features like Accept-Encoding: br (it typically only accepts gzip and deflate) and a consistent From header field set to [email protected]. Reverse DNS lookups often resolve to hostnames ending in .coccoc.vn or .coccoc.com.
📊 Data Usage
Collected data feeds the Coc Coc search index, which powers both web search and verticals such as images, news, and video search for the Vietnamese market. The bot does not use the data for AI training or large language model development; Coc Coc has publicly stated that its crawler is exclusively for indexing and ranking web pages. Page content, meta tags, and structured data (e.g., schema.org) are extracted to populate search snippets and knowledge panels.
⚙️ Rate Limiting Policy
Rate limiting is applied to Coc Coc Bot because its crawl rates can spike when discovering new domains, potentially overwhelming smaller servers. The recommended policy is to allow up to 3 requests per second per IP and implement threshold-based blocking if the bot exceeds 100 requests per minute, as higher volumes indicate a possible misconfiguration or bot duplication that still respects legitimate caching rules.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.