zagrebin
Bot User-Agent:zagrebin
🤖 Overview
Zagrebin is a legitimate web crawler operated by Outbrain Inc., a content discovery and recommendation platform formerly known as Zemanta (acquired by Outbrain in 2017). Its primary purpose is to scan publicly accessible web content for the Outbrain recommendation engine, which surfaces related articles, videos, and sponsored links on publisher sites. According to Outbrain’s official crawler documentation (https://www.outbrain.com/legal/crawlers/), Zagrebin collects metadata—such as article headlines, images, and summary text—to build a content graph that powers personalized recommendations for millions of users.
🌐 Technical Behavior
Zagrebin employs a breadth-first crawling strategy, typically requesting pages at a moderate rate of 1–5 requests per second to avoid overloading origin servers. It uses standard HTTP/1.1 and HTTPS protocols and can follow redirects. Outbrain’s published IP ranges are not static; the crawler originates from a set of dynamically allocated IP addresses belonging to Outbrain’s cloud infrastructure, primarily AWS regions (e.g., us-east-1, eu-west-1). Crawl depth is usually limited to two levels from the root path, and it prioritizes pages with Open Graph or meta tags that indicate shareable content. The bot respects the `Last-Modified` and `ETag` headers to reduce redundant fetches. Outbrain’s documentation notes that Zagrebin does not execute JavaScript, meaning it relies solely on the raw HTML and server-side rendered content.
📋 robots.txt Compliance
Zagrebin explicitly honors all robots.txt directives. Outbrain states in its crawler policy (https://www.outbrain.com/legal/crawlers/) that the bot checks the `Disallow` rules before each request and will not crawl paths blocked by site operators. Additionally, Outbrain provides a mechanism for publishers to opt out via a dedicated email ([email protected]) if robots.txt is insufficient. No known CVE entries or security advisories describe Zagrebin violating robots.txt, and it is widely considered a compliant, well-behaved crawler in the content recommendation ecosystem.
🔍 Detection Indicators
The primary User-Agent string is `Zagrebin/1.0 (http://www.outbrain.com/)`, though older variants may also appear as `Zemanta` or `Outbrain/1.0`. Behavioral fingerprints include a low request rate (typically under 5 req/s), consistent `Accept: text/html,application/xhtml+xml` headers, and the absence of a `Referer` field. Log analytics tools such as Apache’s `mod_security` or cloudflare’s bot management can identify it by the exact UA string. Outbrain also publishes a list of additional User-Agent strings on its legal page, which can be used for whitelisting or monitoring.
📊 Data Usage
Collected content metadata—headlines, images, publication dates, and article snippets—are ingested into Outbrain’s recommendation algorithm to match user interests with relevant content across the Outbrain network. This data is not used for AI model training outside of the recommendation system; it is solely for indexing and ranking content for real-time widget delivery. Outbrain’s privacy policy (https://www.outbrain.com/legal/privacy) clarifies that personal or identifiable information is not intentionally collected, and the crawler is designed to avoid password-protected or login-gated pages.
⚙️ Rate Limiting Policy
Although Zagrebin is a legitimate, non-malicious bot, rate limiting is recommended because its persistent crawling can still cause resource strain on small or high-traffic websites, especially if crawling coincides with peak user load. A threshold-based block (e.g., >10 requests per second from the same IP) is a reasonable policy to enforce fair usage without completely denying access, as Outbrain itself advises site operators to implement rate limiting when needed to protect server stability.
Similar Threats
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.