bnf fr_bot

Bot User-Agent: bnf-fr-bot

🤖 Overview

bnf fr_bot is a web crawler operated by the Bibliothèque nationale de France (BnF), the French National Library, as part of its legal deposit and web archiving mandate under French law (Code du patrimoine articles L132-2 and L132-3). Its primary purpose is to collect and preserve French cultural heritage on the web, indexing websites with a .fr domain or hosted in France for long-term archival in the BnF's digital collections. The bot feeds data into the BnF's web archive, which is accessible to researchers and the public through services like the Archives de l'Internet and the Mémoire du Web portal.

🌐 Technical Behavior

The bnf fr_bot crawls at a moderate rate, typically sending requests during French business hours (08:00–18:00 CET) with a delay of 10 to 30 seconds between successive requests to the same host, as documented in its official crawl policy on the BnF website. It uses HTTP/1.1 and HTTPS protocols, respecting Content-Length headers and downloading full HTML pages, CSS, JavaScript, images, and PDFs for accurate reproduction. Its IP ranges are allocated from the French academic network (Renater) AS number AS2200 and AS3201, with specific blocks such as 193.52.0.0/14 and 195.221.0.0/16. The bot does not crawl deep web or dynamically generated content beyond standard GET requests.

📋 robots.txt Compliance

The bnf fr_bot fully honors robots.txt directives, following the Robots Exclusion Protocol and respecting Crawl-Delay instructions if set. BnF's official documentation states that webmasters can block the bot entirely using Disallow: / in robots.txt, though the institution encourages allowing access for cultural preservation. There are no known documented violations or workarounds in its behavior.

🔍 Detection Indicators

The definitive User-Agent string is Mozilla/5.0 (compatible; bnf fr_bot/1.0; +https://www.bnf.fr/fr/collecte-et-preservation-du-web). Additional identifying signals include the From header set to [email protected], and the bot typically identifies itself with the User-Agent: bnf fr_bot/1.0 portion. The Accept-Language header is usually fr-FR,fr;q=0.9,en;q=0.8. It does not modify its fingerprint across requests, making it straightforward to detect via log analysis.

📊 Data Usage

Collected data is used exclusively for legal deposit and web archiving at the BnF, stored in the SPN (Système de Préservation Numérique) infrastructure. Snapshots are made publicly available through the Archives de l'Internet service, allowing researchers to study the evolution of French websites. No data is used for commercial or AI training purposes, distinguishing it from many modern crawlers.

⚙️ Rate Limiting Policy

Rate limiting is applied to bnf fr_bot based on its low request frequency of 2–6 requests per minute per IP, which is well below typical abuse thresholds. However, administrators may impose stricter limits if the bot's requests interfere with server performance, as its large-scale archival sessions can generate significant load when crawling entire site trees.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.