arquivo.pt
Bot User-Agent:arquivo-pt
🤖 Overview
The Arquivo.pt crawler is operated by the Foundation for Science and Technology (FCT) of Portugal, as part of the national web archive project. Its primary purpose is to systematically collect and preserve publicly accessible web content from the .pt domain and Portuguese‑related sites, maintaining a historical repository accessible at https://arquivo.pt.
🌐 Technical Behaviour
The crawler uses a custom HTTP client that supports both HTTP/1.1 and HTTPS, and follows the Robots Exclusion Protocol and crawl‑delay directives. Crawl frequency is generally moderate and obeys any specified delay; during large archiving campaigns the request rate may increase. IP addresses originate from FCT’s allocated blocks, typically within the 193.136.0.0/16 and 194.65.0.0/16 ranges. The crawler identifies itself with the User‑Agent string Mozilla/5.0 (compatible; Arquivo.pt/1.0; +https://arquivo.pt/crawler.html) and also as ArquivoWebCrawler/1.0.
📋 robots.txt Compliance
Official documentation from https://arquivo.pt/crawler.html states that the crawler fully respects Disallow directives and the Crawl‑Delay rule. It also supports the extended Archive‑It crawl‑delay syntax. No evidence of ignoring robots.txt has been documented; site owners can block the bot entirely via `User‑agent: Arquivo.pt`.
🔍 Detection Indicators
The primary User‑Agent string is Mozilla/5.0 (compatible; Arquivo.pt/1.0; +https://arquivo.pt/crawler.html); a secondary pattern is Mozilla/5.0 (compatible; ArquivoWebCrawler/1.0). Behaviourally, requests come from Portuguese IP ranges and often include an Accept: text/html header. The bot does not send a Referer header and typically uses a GET method for every resource.
📊 Data Usage
All crawled data is stored in the Arquivo.pt repository for long‑term preservation and is made publicly available via the Wayback‑style interface at https://arquivo.pt. The archive supports historical research, cultural heritage preservation, and legal deposit requirements mandated by Portuguese law.
⚙️ Rate Limiting Policy
Although Arquivo.pt is a legitimate archiving agent, its campaigns can temporarily generate high request volumes. Rate limiting with threshold‑based blocking is recommended to protect server resources without permanently barring the crawler, which is essential for the archive’s mission.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.