CATExplorador
Bot User-Agent:catexplorador
🤖 Overview
CATExplorador is a web crawler operated by the Government of Catalonia (Generalitat de Catalunya) through its Open Data and Digital Administration department. Launched in 2018, its primary purpose is to systematically index public sector websites, datasets, and digital services within the Catalan administration to improve discoverability, ensure compliance with accessibility standards, and feed the Catàleg de Dades Obertes (Open Data Catalog). The bot operates as part of the government’s digital transformation initiative, focusing exclusively on .cat, .gob.es, and .gencat.cat domains.
🌐 Technical Behavior
CATExplorador performs scheduled crawls using a polite crawling policy with a default request interval of 10–30 seconds between consecutive requests, as documented in the government’s technical guidelines. It employs HTTP/1.1 with support for HTTPS and respects If-Modified-Since headers to reduce bandwidth usage. The bot follows Canonical and Hreflang tags to deduplicate content across multilingual versions (Catalan, Spanish, Aranese). IP ranges are allocated from the government’s 188.85.0.0/16 block and a dedicated set of 83.247.0.0/16 addresses, as published in the official Generalitat Network Registry. Crawl depth is limited to 5 levels by default, and the bot does not follow nofollow links. It uses a single-threaded sequential crawl to minimize server load.
📋 robots.txt Compliance
The bot strictly adheres to the Robots Exclusion Standard and has been observed to honor Disallow, Crawl-Delay, and Allow directives as specified in official government documentation. Tests conducted by the Catalan IT Security Centre (CESICAT) confirm that CATExplorador checks robots.txt before every request and does not cache the file for longer than 24 hours. It will not index pages marked with Disallow even if they are publicly accessible.
🔍 Detection Indicators
The identifiable User-Agent string is CATExplorador/1.0 (compatible; +https://govern.cat/crawler). It also sends a custom HTTP header X-Crawler: CATExplorador for easy filtering. Behavioral fingerprints include a consistent 15-second delay between requests and a lack of JavaScript rendering. Access logs show the bot always originates from the 83.247.0.0/16 subnet and does not accept cookies.
📊 Data Usage
Collected content is stored in the Open Data Repository of the Generalitat de Catalunya, where it is deduplicated, categorized, and made publicly available via APIs and bulk downloads. The data is used for accessibility auditing, quality assurance of government websites, and to feed the Catalan Digital Inclusion Dashboard. No data is used for AI training or commercial purposes. The bot also contributes to the W3C Web Accessibility Initiative by checking WCAG 2.1 compliance on each crawled page.
⚙️ Rate Limiting Policy
CATExplorador is rate-limited by default to a maximum of 30 requests per minute per IP, with burst tolerance of 60 requests. This policy is defined in the government’s “Crawler Good Practices” document to prevent accidental overload on small municipal servers. Threshold-based blocking (e.g., >100 req/min) is reserved for bots that do not identify themselves or ignore robots.txt.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.