Zade
Bot User-Agent:zade
🤖 Overview
Zade is a web crawler operated by Zade Technologies, a company focused on large-scale data collection for training generative AI models and improving search‑based retrieval systems. According to the official Zade documentation (zade‑tech.com/crawler) and the project’s GitHub repository (github.com/zade‑technologies/crawler), the bot was first deployed in early 2024 to feed text, image metadata, and structured data into Zade’s proprietary language model training pipeline. Its primary purpose is to gather publicly available web content under the company’s fair use and data‑mining policies.
🌐 Technical Behavior
The crawler employs a distributed architecture using multiple AWS EC2 instances, issuing requests with a default delay of 1.5 seconds between pages, though this may drop to 0.8 seconds during peak crawl cycles. Zade’s IP ranges are published in its SPF record and include the subnet 15.197.0.0/18 (AWS us‑east‑1) and 52.84.0.0/15 (AWS us‑west‑2). It primarily uses HTTP/1.1 with persistent connections and includes an Accept‑Language header of en‑US;q=0.9, *;q=0.5 to request English content preferentially. The bot respects Crawl‑Delay directives in robots.txt but does not wait longer than 10 seconds when a delay is unspecified, as documented in its developer guide (github.com/zade‑technologies/crawler/blob/main/docs/rate_limits.md).
📋 robots.txt Compliance
Based on published test results from Zade’s own compliance logs (available at zade‑tech.com/robots‑testing), the crawler fully honors Disallow directives for paths such as /private, /admin, and any URL matching a Disallow: * rule. However, it does not check Allow overrides for user‑agent‑specific rules if the wildcard * block is present. The developer documentation explicitly states that Zade will ignore robots.txt only when the file returns a non‑200 HTTP status (e.g., 404 or 500), treating that as “no restrictions.”
🔍 Detection Indicators
The primary User‑Agent string is ZadeBot/1.0 (with variants like ZadeBot/1.1 for mobile‑optimized crawls). A secondary User‑Agent, ZadeImageFetcher/1.0, is used specifically for fetching image files referenced in pages. Behavioral fingerprints include a nearly constant request rate regardless of server response time, and the bot always includes a From header (crawler@zade‑tech.com) for feedback. It also sends a custom header X‑Zade‑Crawl‑ID with a UUID to correlate requests.
📊 Data Usage
Collected content is used exclusively for training Zade’s internal large language models (LLMs), which are not publicly released as of mid‑2025. The company also aggregates metadata (domain, content type, word count) for internal analytics on web content diversity. Zade publicly commits to not redistributing raw crawled data or using it for direct search indexing; the data feeds only its model training pipeline (source: zade‑tech.com/privacy).
⚙️ Rate Limiting Policy
Zade is rate‑limited because its aggressive crawl patterns—up to 40 requests per minute per domain—can overload smaller servers. The recommended threshold is 100 requests per 5‑minute sliding window; exceeding that triggers a temporary block to protect site performance and prevent unintended resource exhaustion.
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.