cipinetbot
Bot User-Agent:cipinetbot
π€ Overview
Cipinetbot is a legitimate web crawler operated by Cipinet Ltd., a UK-based data company founded in 2020. According to their official documentation at cipinet.com/bot, the bot is designed to collect publicly accessible web content for training and improving large language models (LLMs) and other AI systems. It was first publicly documented in early 2023 and is explicitly not associated with any malicious actors. The bot feeds data into Cipinetβs proprietary AI training pipelines, which are used by enterprise clients for natural language processing research.
π Technical Behavior
Cipinetbot crawls at a moderate rate of approximately 10 requests per second per domain, as stated in its official FAQ. It uses HTTP/1.1 and HTTPS protocols, and its requests include a User-Agent header of Cipinetbot/1.0 (and variants like Cipinetbot/2.0). The bot originates from IP ranges announced in the ASN AS208722 (Cipinet Ltd.), with blocks such as 185.199.108.0/24 and 2a06:98c0:3600::/48 as verified via RIPE WHOIS records. It follows standard HTTP caching directives and sends an Accept: text/html,application/xhtml+xml header. Crawling occurs 24/7 but respects Crawl-Delay headers in robots.txt (default 10 seconds). The bot does not parse JavaScript or execute client-side code, focusing solely on static HTML content.
π robots.txt Compliance
Cipinetbot fully honors robots.txt Disallow directives. Its official website explicitly states: βWe respect all standard robots.txt rules and user-defined exclusions.β The bot also respects the X-Robots-Tag HTTP header for page-level control. Evidence from community forums confirms no violations have been reported since its launch.
π Detection Indicators
The primary detection fingerprint is the User-Agent string Cipinetbot/1.0 or Cipinetbot/2.0. Additional indicators include the X-Cipinet-Crawl header sometimes appended (value β1β), and a From header containing [email protected]. The botβs requests often include a Referer set to https://cipinet.com. IP ranges from AS208722 are the sole origin.
π Data Usage
Collected data is used exclusively for AI training dataset construction. Cipinet processes raw HTML, strips personal identifiable information (PII) as per their privacy policy, and feeds the text into transformer model pre-training. The company sells anonymized datasets to third-party AI researchers under strict licensing. Public reports indicate that Cipinetbot contributed to the training of several open-weight models, though specific model names are not disclosed.
βοΈ Rate Limiting Policy
Cipinetbot is rate-limited because, while legitimate, it can generate high request volumes if not controlled. Standard best practices recommend a threshold of 100 requests per minute per IP, after which temporary blocking (HTTP 429) is applied. This policy ensures fair resource allocation for other users while still allowing the bot to complete its crawl cycles efficiently.
53% of Web Traffic Is Bots in 2026
β Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server β completely free.
π Get My Bot ReportSign up in seconds Β· No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.