shopwiki Bot — Detection, Blocking & Technical Analysis

shopwiki

Bot User-Agent: shopwiki

🤖 Overview

ShopWiki is a web crawler operated by ShopWiki.com, a shopping comparison search engine founded in 2006. Its primary purpose is to index product listings, prices, availability, and merchant information from e‑commerce websites to populate the ShopWiki search database. The bot was originally documented on the ShopWiki help pages (http://www.shopwiki.com/wiki/Help:Bot) and is considered a legitimate, automated agent for aggregating retail data.

🌐 Technical Behavior

The ShopWiki bot identifies itself with the User‑Agent string "ShopWiki/1.0 (compatible; Mozilla/5.0; +http://www.shopwiki.com/wiki/Help:Bot)" or sometimes simply "ShopWikiBot". It performs HTTP GET requests over IPv4 and IPv6, and its crawler follows standard link traversal from merchant product pages. According to the archived official documentation, the bot requests one page approximately every few seconds and respects the Robots Exclusion Protocol. The crawl frequency is moderate — not as aggressive as major search engines but can still be noticeable on smaller sites. IP ranges are not publicly listed in a fixed block but are typically associated with ShopWiki’s hosting infrastructure (often AWS or dedicated servers). The bot does not execute JavaScript or submit forms; it only indexes static HTML and product feeds when available.

📋 robots.txt Compliance

ShopWiki’s official documentation explicitly states that the bot respects Disallow directives in robots.txt. Evidence from webmaster forums and archived help pages confirms that if a site sets User-agent: ShopWikiBot Disallow: /, the bot will cease crawling that domain entirely. However, some reports indicate occasional delays in cache invalidation, but overall it is considered compliant with standard robots.txt rules.

🔍 Detection Indicators

Primary detection includes the User‑Agent string "ShopWiki/1.0" or "ShopWikiBot". Additional fingerprints include the reverse DNS lookup resolving to shopwiki.com and the request path often targeting product pages, category pages, or sitemaps. The HTTP header From may sometimes carry [email protected]. The bot does not use dynamic user‑agent rotation, making log‑based detection straightforward.

📊 Data Usage

Collected data — product names, prices, descriptions, merchant URLs, and stock availability — feed the ShopWiki comparison shopping engine. This data is used to provide real‑time price comparisons to consumers and, historically, helped fuel ShopWiki’s own product recommendation algorithms. No public evidence indicates that ShopWiki uses the data for AI training; it is strictly for indexing and retail analytics.

⚙️ Rate Limiting Policy

Although ShopWiki is a legitimate bot, its crawl pattern can overload smaller e‑commerce sites if left unchecked. Rate limiting is recommended (e.g., 5 requests per second) to preserve server resources. The policy rationale is that the bot does not require real‑time freshness for every product, and throttling helps protect site stability while still allowing the comparison service to function.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

shopwiki

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe