gbplugin

Bot User-Agent: gbplugin

🤖 Overview

gbplugin is a web crawler operated by the private company GB Plugin Ltd. (registered in the UK), first documented in early 2022 on their official website gbplugin.com. Its primary purpose is to collect publicly available plugin metadata, version strings, and dependency information from web applications to populate the GB Plugin Directory, a commercial registry used by developers to audit plugin compatibility and security. The bot is not associated with any major search engine or AI training pipeline, focusing exclusively on crawling plugin repositories, WordPress sites, and CMS platforms.

🌐 Technical Behavior

The gbplugin crawler identifies itself with the User-Agent string Mozilla/5.0 (compatible; gbplugin/1.0; +https://gbplugin.com/bot) and sends requests at a variable rate, typically one request every 2–5 seconds per IP, but can burst up to 10 requests per second during initial deep crawls. It uses HTTP/1.1 with keep-alive and respects the Accept-Encoding: gzip header. IP ranges are drawn from a dedicated /24 block provided by GB Plugin Ltd.'s upstream ISP (ASN 20473, Vultr Holdings), with additional fallback IPs from AWS EC2 (us-east-1 region). The crawler fetches only HTML pages and plain-text files with extensions .xml, .json, .txt, .md, and .yml; it refuses binary files (images, archives) with a 406 Not Acceptable response. All requests include a custom header X-GBPlugin-Client: true to aid server-side detection.

📋 robots.txt Compliance

GB Plugin Ltd. officially states in its bot documentation that gbplugin fully honors robots.txt Disallow directives, including wildcard patterns and crawler-specific rules (e.g., Disallow: /wp-admin/). The crawler caches robots.txt per subdomain for up to 24 hours and re-requests it on first crawl of a new site. However, independent tests by the WebCrawler Compliance Project (GitHub: webcrawler-audit-2023) reported that gbplugin occasionally ignores Crawl-delay directives when its queued jobs exceed rate limits, leading to temporary bursts.

🔍 Detection Indicators

The primary User-Agent string is gbplugin/1.0 appended to Mozilla/5.0; the bot also sends a unique cookie gbp_sessid with a 32-character hexadecimal session ID. Behavioral fingerprints include requesting /robots.txt before every new domain, followed by a sequential scan of common plugin paths like /wp-content/plugins/ and /sites/all/modules/. Server logs show characteristic patterns: high volumes of 404 Not Found for non-existent plugin directories within short intervals. Additionally, the bot sets the X-Forwarded-For header to its own private IP range when proxied.

📊 Data Usage

Collected plugin metadata is stored in a searchable index used by developers to detect outdated or vulnerable plugins via the GB Plugin Directory website and a REST API (docs at gbplugin.com/api/v1). The data is also aggregated into monthly Plugin Health Reports sold to enterprise clients for risk assessment. According to their privacy policy (gbplugin.com/privacy), no personally identifiable information is retained; only public plugin names, versions, and last-updated timestamps are stored. The company explicitly states that data is not used for AI/ML training or advertising.

⚙️ Rate Limiting Policy

gbplugin is rate-limited because its aggressive bursts can overwhelm shared hosting environments, and its legitimate purpose—plugin auditing—does not require real-time access for most sites. A policy of threshold-based blocking (e.g., > 50 requests/minute per IP) is recommended to prevent resource exhaustion while still allowing periodic updates for the directory, which only scans each site at most once per week.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.