webcollage
Bot User-Agent:webcollage
🤖 Overview
WebCollage (now part of Syndigo) is a legitimate content syndication crawler operated by Syndigo LLC, primarily used to aggregate product data—including descriptions, images, pricing, and specifications—from retailer and manufacturer websites for e‑commerce feeds and comparison shopping engines. Originally developed by WebCollage Inc. (acquired by Syndigo in 2017), the bot serves the Syndigo Content Experience Platform, which powers product content across thousands of online retailers such as Amazon, Walmart, and Best Buy. Its purpose is strictly commercial: to collect publicly available product information to help brands maintain consistent, enriched product listings across the web.
🌐 Technical Behavior
The WebCollage crawler targets product detail pages (PDPs), category pages, and sitemaps, using a systematic crawl strategy that respects crawl delays specified in robots.txt. According to Syndigo’s official documentation, the bot sends requests from IP ranges that are dynamically assigned but typically originate from datacenters in the United States (e.g., AWS EC2 blocks). It uses HTTP/1.1 with a persistent connection and includes a valid User-Agent header. The crawler is rate‑limited at the application level to avoid overwhelming origin servers; official guidelines recommend a default crawl delay of 1–2 seconds between requests. It does not attempt to bypass CAPTCHAs or authentication, and it only fetches publicly accessible URLs. The bot follows redirects (301/302) and respects X‑Robots-Tag directives in HTTP headers.
📋 robots.txt Compliance
Syndigo publicly states that their WebCollage crawler fully honors the robots.txt standard, including Disallow directives and Crawl-Delay rules. Independent testing by webmasters confirms the bot does not ignore explicit blocks and will also obey noindex meta tags. The company explicitly advises site owners to use robots.txt if they want to exclude their content from collection.
🔍 Detection Indicators
The primary User-Agent string is Mozilla/5.0 (compatible; WebCollage/2.1; +http://www.webcollage.com/crawler.html) — variations include different version numbers and the optional suffix Syndigo. The bot also frequently includes an identifying From header containing an admin email address ([email protected]). Reverse DNS lookups on its source IPs often resolve to hostnames containing webcollage or syndigo.
📊 Data Usage
Collected data is ingested into the Syndigo Content Experience Platform, where it is normalized, enriched, and distributed to retailer feeds, comparison shopping sites (e.g., Google Shopping), and product content networks. The data is used exclusively for commercial product syndication—not for AI training, search indexing, or analytics unrelated to e‑commerce. Syndigo also employs the data to power automated content audits and gap analysis for brands.
⚙️ Rate Limiting Policy
WebCollage is rate‑limited because its high‑volume crawling can saturate e‑commerce servers, especially during product catalog updates. Web applications that serve large product catalogs should impose a moderate rate limit (e.g., 10 requests per second per IP) combined with a 5‑second crawl delay in robots.txt to guarantee fair resource usage without blocking legitimate syndication needs.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.