kulturarw3

Bot User-Agent: kulturarw3

🤖 Overview

Kulturarw3 is a web crawler operated by the National Library of Sweden (Kungliga biblioteket) as part of the Kulturarw3 project, initiated in 1996 to systematically archive Swedish web content for cultural heritage preservation. The crawler collects publicly accessible websites under the .se domain and other Swedish-related sites, feeding data into the library's digital archive, which is available for research and historical study. Official documentation from the library confirms its non‑commercial, public service mission, distinct from commercial search engines.

🌐 Technical Behavior

Kulturarw3 performs broad, deep crawls following links recursively, with a default crawl delay of 10 seconds between requests to avoid overloading servers. It uses HTTP/1.1 with persistent connections and respects robots.txt directives, including `Crawl‑delay`. The crawler operates from a range of IP addresses owned by the National Library of Sweden (as announced via WHOIS records for subnet 130.242.0.0/16 and others), and typically identifies itself with the User‑Agent string kulturarw3 or kulturarw3/1.0. Requests are made with a `From` header containing the library’s contact email ([email protected]) or a `User‑Agent` followed by a reference to the legal notice. Crawling is performed only on port 80 and 443, and the crawler does not follow redirects to external domains outside its scope without explicit permission.

📋 robots.txt Compliance

Kulturarw3 fully adheres to the Robots Exclusion Standard, as documented by the National Library’s own technical guidelines. It honors all Disallow directives and respects custom `Crawl‑delay` settings. The library explicitly asks site owners to use robots.txt to control access, and the crawler will not revisit a URL that has been disallowed within the same crawl session.

🔍 Detection Indicators

The primary User‑Agent string is kulturarw3 (case‑insensitive) or kulturarw3/1.0, sometimes with a comment like (compatible; KB Kulturarw3; http://www.kb.se/kulturarw3). The HTTP header `From:` [email protected] is frequently present. Reverse DNS lookups for crawling IPs resolve to hostnames under kb.se (e.g., crawl‑proxy1.kb.se). No other common header anomalies are used; the bot is intentionally transparent.

📊 Data Usage

Collected data is stored in the National Library’s digital archive (Kulturarw3 archive) for long‑term preservation, enabling researchers to study Swedish web history from 1996 onward. The archive is publicly accessible on‑site at the library and partially online via the web archive interface at webarchive.kb.se. No data is sold or used for AI training or commercial analytics; it is strictly for cultural heritage and academic research.

⚙️ Rate Limiting Policy

While Kulturarw3 is a legitimate, non‑malicious crawler, its extensive and persistent crawling can strain server resources if not throttled. Rate limiting is recommended to protect application performance, with threshold‑based blocking applied only when the bot exceeds reasonable request rates (e.g., >10 requests per second from a single IP) despite its built‑in 10‑second delay.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  ·  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.