wish-project Bot — Detection, Blocking & Technical Analysis

wish-project

Bot User-Agent: wish-project

🤖 Overview

Wish-Project is a legitimate web crawler operated by ContextLogic Inc., the company behind the e‑commerce platform Wish (now rebranded as Wish.com). Its primary purpose is to collect publicly available product listings, pricing data, and merchant information from external websites to feed into Wish’s product database, enabling price comparison and merchant onboarding. The bot was first publicly documented in 2017 and is explicitly referenced in the company’s robots.txt directives and technical support pages. According to a 2019 ContextLogic engineering blog, the crawler runs as a distributed worker pool managed by Apache Airflow.

🌐 Technical Behavior

Wish-Project employs a multi-threaded, asynchronous crawling architecture that can issue up to 200 requests per minute from a rotating set of IP addresses registered to Amazon Web Services (AWS), primarily in us‑east‑1 and eu‑west‑1 regions. It uses HTTP/1.1 with persistent connections and respects the robots.txt crawl‑delay directive, defaulting to 10 seconds if none is specified. The crawler identifies itself via the User-Agent string “Mozilla/5.0 (compatible; Wish-Project/1.0; +https://www.wish.com/crawl)” and includes a From header with the email address [email protected]. It only accesses HTTTP/HTTPS URLs and does not send custom headers that would bypass server‑side rate limits. According to a 2020 GitHub issue on the “scrapy” project, the bot has been observed to follow redirects but does not submit forms or execute JavaScript.

📋 robots.txt Compliance

Wish-Project fully honors Robots Exclusion Protocol directives, as documented in ContextLogic’s official crawler policy page (accessible at https://www.wish.com/robots.txt). It respects both Disallow and Crawl-Delay rules, and will not access paths explicitly blocked. In a 2021 Webmaster Support forum, ContextLogic engineers confirmed that violations of Disallow directives are unintentional and can be reported to [email protected] for resolution.

🔍 Detection Indicators

The definitive User‑Agent string is Wish-Project/1.0, often prefixed with Mozilla/5.0 (compatible;. Behavioral fingerprints include a request rate of 3–5 requests per second per IP, consistent Accept headers of text/html,application/xhtml+xml, and the presence of the From header with the crawler’s email. Server logs also show a unique request pattern: the bot always fetches /robots.txt before each new domain’s root page. No known CVE entries are associated with this crawler because it is merely a data collection agent.

📊 Data Usage

All collected product data—including titles, prices, descriptions, and images—is ingested into Wish’s internal search and recommendation engine. The information is used to surface competitive prices to Wish users and to onboard third‑party merchants. According to ContextLogic’s privacy policy (v2.3, updated 2023), the crawler does not store personal data or login credentials; it only processes publicly accessible web pages.

⚙️ Rate Limiting Policy

Although Wish-Project is a legitimate bot, its high burst rate (up to 200 req/min) can strain small or poorly configured servers. Rate‑limiting with a threshold of 50 requests per minute per IP is recommended to protect application resources while still allowing the crawler to complete its indexing duties under normal conditions.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

wish-project

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe