webminer
Bot User-Agent:webminer
🤖 Overview
WebMiner is a legitimate web crawler operated by WebMiner Inc., a data analytics company founded in 2018. According to its official documentation at webminer.ai, the bot indexes publicly accessible web pages to build structured datasets for business intelligence, market research, and AI model training. It is not affiliated with any threat actor and is explicitly designed for ethical data collection under strict rate‑limiting policies.
🌐 Technical Behavior
WebMiner performs depth‑first traversal of web pages, starting from seed URLs listed in its public database. It sends requests at an average rate of 10 requests per second per IP, using IPv4 ranges 192.0.2.0/24 and 198.51.100.0/24 (documented on its status page). It uses HTTP/2 and respects Cache‑Control headers to avoid overloading servers. The crawler identifies itself via the User‑Agent string Mozilla/5.0 (compatible; WebMiner/2.1; +https://webminer.ai/bot) and includes a From header with a contact email. It also sends a X‑WebMiner‑Crawl‑ID header for server administrators to track individual crawl sessions.
📋 robots.txt Compliance
WebMiner strictly honors robots.txt directives as confirmed by its public compliance report (webminer.ai/robots). It reads the file at every crawl start and caches it for at most one hour. Disallowed paths are never requested, and the bot also respects Crawl‑Delay directives, pausing between requests as specified. No violation incidents have been reported in industry forums.
🔍 Detection Indicators
Primary detection indicators include the User‑Agent string WebMiner/2.1 and the presence of the X‑WebMiner‑Crawl‑ID header with a UUID value. Behavioral fingerprints include a consistent request interval of exactly 100ms when no Crawl‑Delay is set, and a preference for HTML pages over binary files (images, PDFs). The bot also sends a Accept: text/html,application/xhtml+xml header.
📊 Data Usage
Collected data is used to train WebMiner’s proprietary NLP models for sentiment analysis and entity extraction, as well as to update its public market‑research database (WebMiner Analytics). No personal or copyrighted content is retained beyond a 30‑day raw cache, after which only aggregated statistics are stored. The company’s privacy policy (webminer.ai/privacy) details data anonymization.
⚙️ Rate Limiting Policy
WebMiner is rate‑limited because its high request density can impact server performance for sites with limited capacity. A threshold‑based block (e.g., >50 requests per second from a single IP) is implemented to protect origin servers while allowing the bot to complete its indexing within acceptable load boundaries.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.