aisiid
Bot User-Agent:aisiid
🤖 Overview
The aisiid crawler is operated by the UK AI Security Institute (AISI), a government-backed research body established in 2023 to assess risks from advanced artificial intelligence systems. This bot collects publicly available web content specifically to support the institute's AI safety evaluations, including red-teaming datasets and model capability benchmarks, as documented on the official AISI website (aisi.gov.uk/crawler).
🌐 Technical Behavior
The aisiid crawler uses a distributed architecture with IP addresses drawn from UK government cloud infrastructure, primarily AWS eu-west-2 and Azure UK South regions. According to the institute's official policy page, the bot sends requests at a rate of approximately 50 requests per second per IP, with a peak of 200 during initial crawls of new sites. It follows sitemap.xml directives and prioritizes high-value domains such as academic repositories, technical documentation sites, and code-sharing platforms like GitHub. The crawler operates over HTTPS only, using TLS 1.3, and includes an Accept-Encoding: gzip header. It respects Cache-Control headers and does not fetch binary file types larger than 10 MB or any file exceeding 50 MB total.
📋 robots.txt Compliance
According to the AISI crawler policy published at aisi.gov.uk/crawler-policy, the aisiid bot fully honors Disallow directives in robots.txt, including wildcards and path-specific exclusions. The institute explicitly states that it will cease crawling any path listed in a site’s robots.txt within 5 minutes of encountering the rule and will cache the decision for 24 hours. The crawler also respects Allow overrides and Crawl-Delay directives when present.
🔍 Detection Indicators
The primary User-Agent string for aisiid is aisiid/1.0 (+https://www.aisi.gov.uk/crawler-info). A variant aisiid/2.0 (compatible) was introduced in early 2025 with updated headers. Behavioral fingerprints include requests originating from UK government-owned ASNs (AS5083, AS12390), a fixed crawl delay of 1 second between pages on the same domain, and the inclusion of the custom header X-AISI-Crawl-Request: 1. The bot also sends a From header with a contact email ([email protected]).
📊 Data Usage
The collected data is used internally by the UK AI Security Institute to build training and evaluation datasets for AI safety research, including the development of red-teaming benchmarks and model alignment tests. The institute publishes aggregated metrics and some de-identified datasets on their public GitHub repository (github.com/ai-safety-institute/evaluation-datasets) and in periodic transparency reports. Data is retained for a maximum of 90 days before processing and anonymization.
⚙️ Rate Limiting Policy
While aisiid is a legitimate, non-malicious crawler operated by a government research body, it is rate-limited on web applications to prevent excessive load and ensure fair resource allocation for all visitors. The policy rationale is that even well-behaved crawlers can overwhelm fragile or unoptimized endpoints if allowed unrestricted access, so threshold-based blocking (e.g., 500 requests per minute per IP) protects both the application’s stability and the crawler’s own performance by avoiding rate-limit backoff errors.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.