wish-la

Bot User-Agent: wish-la

🤖 Overview

wish-la is a legitimate web crawler operated by ContextLogic Inc., the parent company of the e‑commerce platform Wish. First identified in public logs around 2016, its primary purpose is to systematically index product listings, pricing, availability, and merchant details from external e‑commerce sites to feed into the Wish marketplace’s search and recommendation engine. The bot is a core component of Wish’s automated supply‑side data collection, enabling the platform to offer competitive pricing and up‑to‑date inventory across millions of products.

🌐 Technical Behavior

The crawler uses standard HTTP/1.1 requests with GET methods and often includes Accept-Encoding: gzip headers to minimise bandwidth. According to documented observations on forums such as GitHub (e.g., issue threads in webmaster tools), wish-la typically sends between 1 and 3 requests per second per host but can burst to higher frequencies when discovering new product pages. It does not perform JavaScript rendering, relying solely on static HTML parsing. The IP addresses originate predominantly from Amazon Web Services (AWS) EC2 ranges (e.g., 54.xxx.xxx.xxx) and occasionally from Google Cloud Platform datacenters, primarily in the United States. The crawler maintains persistent connections and honours Cache-Control headers to avoid re‑fetching unchanged resources. It follows internal redirects (301/302) but does not submit forms or interact with JavaScript‑driven content.

📋 robots.txt Compliance

Multiple independent webmaster reports and the official robotstxt.org community archives confirm that wish-la fully respects robots.txt Disallow directives. ContextLogic has published a support page (archived at help.wish.com) stating that the crawler is configured to obey the standard exclusion protocol. No known violations or deliberate bypassing of robots.txt have been documented in public security advisories.

🔍 Detection Indicators

The primary identification string is the User-Agent header value "wish-la" (case‑sensitive and without version numbers). Some requests may include an additional From header with an email address like [email protected]. The bot does not send a custom X‑Robots‑Tag and its request pattern shows a consistent, low‑variation crawl interval that helps distinguish it from malicious scrapers. Log entries show a short Accept-Language field (e.g., en-US,en;q=0.9) and a standard Connection: keep-alive header.

📊 Data Usage

The data collected by wish-la—including product titles, descriptions, price points, image URLs, stock status, and seller identifiers—is ingested directly into Wish’s backend systems. This information powers the platform’s product catalog crawler, dynamic pricing algorithms, and the internal search indexing pipeline. ContextLogic uses the crawled data to train recommendation models and to detect price discrepancies across merchants, all within the bounds of their merchant agreements.

⚙️ Rate Limiting Policy

Although legitimate and compliant, wish-la is frequently rate‑limited by hosting providers because its sustained crawling can consume significant server resources if left unthrottled. The policy rationale is to enforce threshold‑based blocking (e.g., >10 requests per second) to protect smaller sites from unintended load, while still allowing the bot to operate normally under typical crawl patterns.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.