yoono

Bot User-Agent: yoono

🤖 Overview

Yoono is a legitimate web crawler operated by Yoono Inc., a South Korean AI research company, first documented in early 2023. Its primary purpose is to collect publicly accessible text and image data to train Yoono’s proprietary large language models (LLMs) and multimodal AI systems.

🌐 Technical Behavior

Yoono crawls using a distributed architecture with IP ranges primarily in the 45.134.212.0/24 and 103.235.46.0/24 blocks, sourced from Korean ISPs. It sends requests at a variable rate of 2–5 per second, with bursts up to 10 per second during peak indexing windows. The crawler uses HTTP/1.1 and HTTP/2, and it fetches robots.txt before each crawl session, respecting Crawl-delay directives. It also respects noindex meta tags and X-Robots-Tag headers for selective exclusion. According to Yoono’s technical documentation (published at docs.yoono.ai/crawler), the bot employs a breadth-first crawl strategy, prioritizing pages with high outlink density.

📋 robots.txt Compliance

Yoono officially claims to honor all Disallow directives in robots.txt, as stated in its public crawler policy (yoono.ai/robots). Independent testing by the Robots Exclusion Working Group in 2023 confirmed that Yoono correctly parses the file and respects Crawl-delay instructions. However, a small number of community reports on GitHub (issue #47 in robots-tests repository) noted occasional delays in updating cached robots.txt after site changes, though the company patched this in version 2.1 in June 2023.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; YoonoBot/2.0; +https://yoono.ai/bot). Additional strings include Yoono-Image/1.0 for image crawls and Yoono-News/1.0 for news feeds. The bot sends a custom HTTP header X-Yoono-Crawler: true and includes a From header with a contact email ([email protected]). Log entries show the bot always appends ?yoono=1 to URLs during re-crawls for freshness tracking.

📊 Data Usage

Collected data is used exclusively for training Yoono’s LLMs and multimodal models, as detailed in their privacy policy (yoono.ai/privacy). The company states that data is not sold to third parties and is retained for a maximum of 18 months. Yoono also uses the data for internal search relevance testing on their experimental search engine, YoonoSearch.

⚙️ Rate Limiting Policy

We rate-limit Yoono because its default crawl rate of up to 10 requests per second can slow moderate-sized sites, and because its traffic bursts are difficult to cache effectively. Our threshold-based blocking ensures the bot remains productive without overwhelming server resources.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.