indexer
Indexer User-Agent:indexer
🤖 Overview
Indexer is a web crawler operated by Yandex, the Russian multinational technology company, as documented on their official Webmaster platform (https://yandex.com/support/webmaster/robot-workings/). Its primary purpose is to discover, fetch, and index publicly accessible web pages for the Yandex Search engine, Yandex.Images, Yandex.Video, and other Yandex services. First deployed in the early 2000s, Indexer has evolved to support modern web technologies including AJAX, JavaScript rendering (via a headless Chromium engine), and mobile-first indexing, ensuring comprehensive coverage of contemporary sites.
🌐 Technical Behavior
Indexer operates on a distributed crawling architecture with an average request rate of approximately 10 requests per second per IP address, though bursts of up to 50 requests per second have been observed during deep crawls. It communicates primarily over HTTP/1.1 and HTTPS, supporting both IPv4 and IPv6. Verified IP ranges are published by Yandex and include 77.88.0.0/18, 93.158.134.0/24, and 5.45.192.0/18 (source: Yandex Webmaster help page on IP addresses). The crawler respects the Crawl-Delay directive in robots.txt and uses a custom HTTP header X-Yandex-From to identify itself. It also fetches sitemaps via sitemap.xml and follows nofollow and noindex meta tags.
📋 robots.txt Compliance
Based on Yandex’s official documentation, Indexer fully honors all Disallow directives in robots.txt, including pattern-based exclusions with * and $. Yandex provides a robots.txt testing tool in its Webmaster interface to verify compliance. However, the crawler may ignore Disallow for certain Yandex services (e.g., Yandex.Maps) if the data is required for those specific products, but this is rare and documented.
🔍 Detection Indicators
The primary User-Agent string is YandexBot/3.0 (compatible; YandexBot; +http://yandex.com/bots); for specialized indexing tasks, the string YandexIndexer/1.0 is used. Additional identifying headers include Accept-Language: ru-RU and From: [email protected]. Behavioral fingerprints include a typical inter-request interval of 100–200 ms and a preference for HTML pages over images unless targeting image search.
📊 Data Usage
Data collected by Indexer is used primarily for populating Yandex Search results, including web, image, video, and news verticals. It also feeds into Yandex’s AI training pipelines for projects such as the Alice voice assistant (Yandex’s equivalent of Siri) and Yandex Translate’s neural machine translation models. Metadata like page titles and descriptions are reused in Yandex.Direct (advertising) and Yandex.Market (e-commerce) after processing.
⚙️ Rate Limiting Policy
Indexer is rate-limited because its aggressive crawl behavior (up to 50 req/s during bursts) can overwhelm under-provisioned web servers if left unchecked. A threshold-based rate limit of 100 requests per 10 seconds per IP is recommended to protect server resources while still allowing legitimate indexing activity. Yandex itself advises webmasters to use the Crawl-Delay directive to control frequency.
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.