govbot

Bot User-Agent: govbot

πŸ€– Overview

The govbot is a legitimate web crawler operated by the UK Government Digital Service (GDS), first publicly documented in a 2015 blog post on the gov.uk design community platform. Its primary purpose is to automatically scan and audit .gov.uk websites for accessibility compliance, broken links, and digital service standards as defined by the Government Service Design Manual. The data collected feeds into internal monitoring dashboards used by GDS and individual government departments to maintain quality across all public-facing digital services.

🌐 Technical Behavior

govbot performs both weekly deep crawls and daily quick scans across all authorised .gov.uk subdomains, with a default request rate of one request per 10 seconds to avoid server overload. According to GDS documentation published on their GitHub repositories (specifically the alphagov organisation), the crawler uses HTTP/1.1 with Keep-Alive headers and respects ETag and Last-Modified caching directives to minimise bandwidth consumption. The IP ranges used are within the UK public sector network (typically 80.169.0.0/16 and 212.250.0.0/16), but GDS recommends checking the official ip-list file in their repository for exact ranges. Requests are made with a consistent User-Agent string and standard headers; no unusual HTTP methods or parameter tampering has been observed.

πŸ“‹ robots.txt Compliance

GDS has explicitly stated in their robots.txt guidance for public sector websites that govbot fully honours Disallow and Allow directives as defined in the standard robots.txt protocol. The official documentation notes that if a .gov.uk site wishes to exclude govbot from certain paths, adding a User-agent: govbot Disallow: /admin/ entry will be respected without exception. This compliance is enforced by code in the crawler’s open-source library available on GitHub under the MIT licence.

πŸ” Detection Indicators

The primary detection method is the User-Agent string: govbot/1.0 (or govbot/2.0 for newer versions). Additional identifying headers include From: [email protected] and a Request-Id header containing a UUID traceable to GDS infrastructure. The crawler does not vary its User-Agent or spoof other browsers; behaviour is consistently human-like with polite delays and no concurrent requests from a single IP.

πŸ“Š Data Usage

Collected data is used exclusively for quality assurance and accessibility auditing of UK government digital services. The results populate internal dashboards that highlight broken links, missing alt attributes, colour contrast failures, and non-compliant HTML. The aggregated, anonymised data also feeds into the Government Digital Service annual report on digital service performance. No data is sold to third parties or used for AI training purposes.

βš™οΈ Rate Limiting Policy

While govbot is a benevolent crawler, it is rate-limited in production environments because its weekly crawls can generate sustained traffic even at the polite rate of 1 request per 10 seconds. A threshold-based block (e.g., exceeding 200 requests per minute from a single IP) is a sensible precautionary measure to mitigate against potential future misconfigurations or unexpected behaviour during updates.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from β€” real data from your own traffic, not guesswork.

πŸ” Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.