Cliqzbot
Bot User-Agent:cliqzbot
🤖 Overview
Cliqzbot is a web crawler operated by Cliqz GmbH, a German technology company headquartered in Munich that developed a privacy-focused browser and search engine. The bot was first publicly documented in 2015 and is designed to index publicly accessible web content for Cliqz’s search engine, which emphasizes user privacy by avoiding tracking and personalization. Cliqz was acquired by Mozilla in 2017 and later discontinued its consumer products in 2020, but the crawler itself remains documented in historical records and may still be active for archival or internal purposes. The bot’s primary function is to feed data into Cliqz’s search index, which was built on a proprietary algorithm that used anonymous browsing signals to improve relevance without logging user identities.
🌐 Technical Behavior
Cliqzbot performs HTTP GET requests to fetch web pages, following hyperlinks using a breadth‑first crawl strategy. According to the official documentation archived at cliqz.com/en/company/cliqzbot, the bot sends a User-Agent header of Cliqzbot/1.0 (+http://cliqz.com/company/cliqzbot) and typically respects a Crawl-Delay directive in robots.txt of 5 seconds or more. The crawler operates from IP ranges registered to Cliqz GmbH, which are predominantly assigned to German data centers (e.g., subnets within AS20773). Request frequency is moderate, with bursts of several hundred requests per minute during active indexing. The bot does not appear to use JavaScript rendering, relying solely on static HTML parsing. It identifies itself via the User-Agent and sometimes includes a From header with a contact email address documented on its official page. No CVE entries or security advisories have been associated with Cliqzbot as an aggressor.
📋 robots.txt Compliance
Based on archived copies of Cliqz’s official documentation and third‑party analyses by webmaster forums (e.g., the WebmasterWorld thread from 2016), Cliqzbot is confirmed to honor both Disallow and Crawl-Delay directives in robots.txt. The bot checks the file before each crawl session and obeys all explicit exclusions. No evidence of deliberate violation has been reported; webmasters have noted that the bot ceases crawling after a Disallow: / directive. This compliance is consistent with Cliqz GmbH’s stated privacy focus and good‑neighbor policy.
🔍 Detection Indicators
The primary detection indicator is the User-Agent string: Cliqzbot/1.0 (+http://cliqz.com/company/cliqzbot). Some variations append a version number or contact email. Behavioral fingerprints include a consistent crawl interval that respects the site’s Crawl-Delay, and the absence of JavaScript processing. The bot does not use any special request headers beyond standard HTTP fields. Webmasters can identify it through server logs by filtering for the string “Cliqzbot”. The official page at cliqz.com/company/cliqzbot (now redirecting to a generic page after the shutdown) remains the canonical source for verification.
📊 Data Usage
Collected data by Cliqzbot is used exclusively to build and update the Cliqz search index. The search engine aimed to provide relevant results without storing user profiles or search histories, instead relying on anonymous crowd‑sourced signals. Content such as page text, titles, and metadata is extracted and stored in a distributed index. No evidence suggests the data is used for AI training, advertising, or any third‑party analytics. The bot does not collect personal information from web pages beyond what is publicly visible.
⚙️ Rate Limiting Policy
Cliqzbot is rate‑limited because its bursty crawl patterns, while generally well‑behaved, can still overwhelm smaller servers if left unchecked. The policy rationale for threshold‑based blocking is to prevent resource exhaustion while still allowing the legitimate privacy‑focused search indexing to proceed. Webmasters are advised to set a Crawl-Delay of 10 seconds or monitor request volumes exceeding 50 requests per minute from Cliqz IP ranges to trigger temporary blocks.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.