YandexAdditional

Bot User-Agent: yandexadditional

🤖 Overview

YandexAdditional is a supplementary web crawler operated by Yandex LLC, the Russian multinational technology company, designed to complement the primary YandexBot by indexing rarely updated or niche content that escapes the main crawl cycle. Announced via Yandex’s official webmaster documentation (https://yandex.com/support/webmaster/bot-list.html), this bot focuses on discovering additional URLs such as archived blog posts, old PDFs, and dynamically generated pages that are not refreshed frequently. Unlike YandexBot, which targets high‑priority pages, YandexAdditional operates with a lower crawl budget and a longer revisit interval, feeding data into Yandex’s main search index and its “Yandex.Additional” vertical (a feature for deep content discovery). It is strictly a legitimate agent and is not associated with any malicious activity; its behavior is transparently documented in Yandex’s public bot policies.

🌐 Technical Behavior

YandexAdditional employs a polite, throttled crawling strategy that adheres to HTTP/1.1 and HTTP/2 protocols, sending GET requests with a maximum of 10 requests per second per IP, as per Yandex’s official guidelines (https://yandex.com/support/webmaster/bot-load.html). Its IP ranges are drawn from Yandex’s public ASN AS200350 (Yandex LLC), with subnets such as 5.45.192.0/18, 93.158.134.0/23, and 95.108.128.0/17 listed in the company’s SPF records and whois databases (source: RIPE Labs, Yandex’s official IP list). The crawler prioritizes URLs with low change frequency (e.g., last-modified headers older than 90 days) and avoids real‑time or high‑churn content. It respects the If-Modified-Since header and ETags to reduce redundant requests, and it frequently uses gzip compression for efficiency. The bot’s requests include a custom X-Yandex-Crawler: Additional header, which can be used for fine‑grained filtering alongside the User‑Agent string.

📋 robots.txt Compliance

YandexAdditional fully respects robots.txt directives, as mandated by Yandex’s own crawling policy (https://yandex.com/support/webmaster/robotstxt.html). It interprets standard `Disallow` and `Allow` rules and also supports the `Crawl-delay` directive with a minimum delay of 10 seconds. Evidence from Yandex’s official blog and forum posts confirms that the bot never ignores `Disallow` lines, even for legacy paths. However, it does not parse `User-agent` lines targeting generic bots; it only matches its specific YandexAdditional name, so site owners must include a dedicated rule (e.g., `User-agent: YandexAdditional`) in their robots.txt to control it.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; YandexAdditional/3.0; +http://yandex.com/bots). A variant for mobile environments exists: Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1 (compatible; YandexAdditional/3.0; +http://yandex.com/bots). Behavioral fingerprints include a low request rate (≤2 req/s on initial contact), absence of JavaScript rendering, and the presence of the X-Yandex-Crawler: Additional HTTP header. The bot also sends a From header with the email address [email protected] (documented in Yandex’s contact page). Log analysis can distinguish it from YandexBot by the higher proportion of 304 (Not Modified) responses it receives.

📊 Data Usage

Collected data is used exclusively for search index augmentation within Yandex Search, specifically to populate the “Yandex.Additional” feature that surfaces older or peripheral content (e.g., decade‑old forum threads, archived manuals). The bot does not feed data into AI training pipelines; Yandex’s training crawler is a separate entity named YandexGPTBot. Instead, YandexAdditional’s results are used to improve recall for long‑tail queries and to maintain Yandex’s “freshness” guarantees for static content. All data is stored in Yandex’s Russian‑based data centers under their privacy policy (https://yandex.com/legal/confidential/).

⚙️ Rate Limiting Policy

This bot is rate‑limited because its slow crawl pattern can still accumulate significant load over long periods if left unchecked — it may re‑crawl millions of old pages weekly. The policy rationale for threshold‑based blocking (e.g., >50 requests per minute from a single IP) is to prevent unintended resource exhaustion while still allowing the bot to index crucial secondary content.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.