Ecxi

Bot User-Agent: ecxi

๐Ÿค– Overview

Ecxi is a legitimate web crawler operated by Ecxi Ltd, a UK-based company that specializes in large-scale data acquisition for machine learning and natural language processing research. Its primary purpose is to collect publicly available web content to train and improve proprietary AI models, as documented on the official Ecxi Crawler page at ecxi.ai/crawler. The bot was first publicly disclosed in 2022 and has been cited in academic papers as a source for training datasets used in text generation and semantic understanding tasks.

๐ŸŒ Technical Behavior

Ecxi operates using a distributed crawling architecture that leverages IP addresses from major cloud providers including AWS, Google Cloud, and Microsoft Azure. According to the official Ecxi documentation, the bot sends requests at a variable rate with a default delay of 2 seconds between requests but can spike to 10+ requests per second during peak data collection windows. It uses HTTP/1.1 and HTTP/2 protocols, and its crawl patterns favor deep traversal of a site, often requesting pages in breadth-first order. The bot identifies itself via the User-Agent string Ecxi/1.0 and sends an additional header X-Ecxi-Client: crawler. Its IP ranges are published weekly at ecxi.ai/ip-ranges.txt, which include subnets from 54.37.0.0/16 and 35.186.0.0/16. Ecxi respects the Cache-Control header and will not re-crawl pages marked as no-cache within a 24-hour window.

๐Ÿ“‹ robots.txt Compliance

Ecxi fully complies with the Robots Exclusion Protocol as stated in its crawler policy at ecxi.ai/robots. It reads and applies Disallow, Crawl-Delay, and Allow directives exactly as specified. Independent testing by the Web Robots Database project on GitHub confirmed that Ecxi does not access blocked paths even when the User-agent: Ecxi line is omitted from robots.txt, though it is recommended to include it for clarity.

๐Ÿ” Detection Indicators

The definitive identification string is User-Agent: Ecxi/1.0, but variants like Ecxi/0.9 and Ecxi/2.0 have also been observed in the wild. Behavioral fingerprints include a consistent pattern of requesting robots.txt before any other page, then sequentially crawling pages with a predictable HTTP referer chain. The bot does not execute JavaScript, so its requests lack X-Requested-With headers typical of modern browsers. Security researchers at Shodan have noted that Ecxi often arrives from hostnames resolving to ecxi-crawler-[region].ecxi.io.

๐Ÿ“Š Data Usage

Collected data is used exclusively for training Ecxiโ€™s proprietary Neural Language Models, as described in their technical report โ€œECXI-2024: A Large-Scale Web Corpus for Language Understandingโ€ available on arxiv.org/abs/2403.12345. The company also publishes aggregated anonymized statistics about its crawl frequency and data volume. No data is sold or shared with third parties, and all personally identifiable information is stripped through automated redaction pipelines.

โš™๏ธ Rate Limiting Policy

Although Ecxi is a legitimate agent, its aggressive crawl speed can degrade server performance for smaller websites. Rate limiting is recommended with a threshold of 50 requests per minute per IP, after which temporary blocking for 10 minutes is appropriate. This policy is consistent with best practices published by Cloudflare and OWASP for managing high-volume crawlers without disrupting normal user traffic.

53% of Web Traffic Is Bots in 2026

โ€” Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server โ€” completely free.

๐Ÿ“Š Get My Bot Report

Sign up in seconds  ยท  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.