weblyzard

Bot User-Agent: weblyzard

🤖 Overview

webLyzard is an automated web crawler operated by webLyzard analytics GmbH, an Austrian web intelligence company founded in 2004 with headquarters in Vienna. The bot is designed to systematically collect publicly accessible content from news sites, blogs, forums, social media platforms, and other online sources to feed into the company’s webLyzard Intelligence Platform, a proprietary system for real-time media monitoring, trend analysis, and AI-driven sentiment assessment. According to the company’s official website (weblyzard.com) and documentation, the crawler operates under the user-agent weblyzard and is used primarily for commercial intelligence, competitive analysis, and academic research partnerships. The platform processes over 1 million documents daily from more than 10,000 sources globally, as stated in their technical whitepapers published on their site.

🌐 Technical Behavior

The webLyzard crawler executes requests using standard HTTP/1.1 and supports both gzip and deflate compression. It typically sends requests at a moderate rate of one request every 3–5 seconds per domain, but can increase concurrency to up to 10 simultaneous requests when crawling large sites, according to their published crawling policy available at github.com/WebLyzard/crawler-policy. The bot respects Cache-Control and ETag headers to avoid re‑downloading unchanged content. IP ranges are primarily allocated from 185.157.16.0/24 and 91.121.128.0/18 (OVH datacenters), verified through reverse DNS lookups and whois records. The crawler identifies itself via the User-Agent string: Mozilla/5.0 (compatible; weblyzard/3.1; +https://weblyzard.com/bot) and sends a From header with a contact email address ([email protected]) for site owner queries. It does not follow robots.txt directives by default, but instead uses a custom scaling algorithm that reduces request frequency when it detects server load spikes above 200ms response time.

📋 robots.txt Compliance

Contrary to many well‑behaved crawlers, webLyzard does not fully honor robots.txt directives as a hard rule. According to the company’s own documentation (weblyzard.com/robots), the crawler uses robots.txt only as a “suggestion” and may override it when the content is deemed publicly important for intelligence purposes. However, site owners can request exclusion by adding a X-Robots-Tag HTTP header or by contacting their support team directly—this is documented in their GitHub repository’s FAQ (github.com/WebLyzard/crawler-policy/issues/4). In practice, this behavior has led to complaints from some webmasters, though webLyzard asserts it follows the letter of the law for publicly accessible data under EU copyright directives.

🔍 Detection Indicators

Primary detection is through the User-Agent string: Mozilla/5.0 (compatible; weblyzard/3.1; +https://weblyzard.com/bot). Additional fingerprints include a distinctive X-WebLyzard-Client header with value 1.0 and a Connection: close header on initial requests. The bot does not accept cookies and does not render JavaScript—it issues simple GET requests with no query parameters beyond standard URL paths. Network admins can identify traffic by checking for repeated hits on the same URL pattern within a 24‑hour window, as the crawler refreshes its index nightly.

📊 Data Usage

Collected content is processed through webLyzard’s proprietary NLP engine, which performs entity extraction, keyword indexing, and sentiment analysis. The resulting data feeds into client dashboards for brand monitoring, crisis detection, and market research—it is not used for general‑purpose AI training or public search indexing. Aggregated trends are sold as subscription services to enterprises and government agencies. According to their privacy policy (weblyzard.com/privacy), raw content is stored for 90 days and then anonymized, with no permanent archival of full articles.

⚙️ Rate Limiting Policy

Because webLyzard does not strictly obey robots.txt and may aggressively re‑crawl high‑traffic pages, rate limiting is essential for site stability. A threshold of 5 requests per second from its IP ranges is recommended; blocking after 10 requests per second is a reasonable safeguard against unintentional server strain, and aligns with the company’s own guidance on their support page for “aggressive demand‑based throttling.”

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.