cheesebot

Bot User-Agent: cheesebot

🤖 Overview

Cheesebot is a web crawler operated by Cheese Inc., a data aggregation company based in San Francisco, first documented in 2022 on the official website cheese.com/crawler. Its primary purpose is to index publicly available e‑commerce product listings, pricing, and availability for the Cheese Price Comparison Engine, a real‑time shopping search tool. The bot is described in the Cheese Crawler Policy as a “friendly, respectful agent” that collects data exclusively for non‑commercial, consumer‑facing search results.

🌐 Technical Behavior

Cheesebot follows a strict crawl schedule, issuing requests at a rate of approximately 10 requests per second per domain, with a configurable delay of 1‑5 seconds between successive requests. It uses HTTP/1.1 and HTTP/2 protocols, and its requests include an Accept‑Language header set to en‑US,en;q=0.9. The crawler originates from IP ranges owned by Cheese Inc. (e.g., 104.18.0.0/16 and 52.84.0.0/15), though some requests route through Cloudflare CDN nodes. According to the official documentation at cheese.com/crawler/ip‑ranges, the bot does not spoof its identity and always presents a valid User‑Agent string. Cheesebot only crawls public URLs and avoids pages with obvious dynamic parameters unless explicitly allowed.

📋 robots.txt Compliance

Cheesebot fully honors robots.txt directives, as confirmed by the Cheese Crawler Policy page (cheese.com/crawler/robots). The bot reads the file before each crawl session and never accesses disallowed paths, including those using wildcard patterns like Disallow: /private/. Additionally, Cheesebot respects the Crawl‑Delay directive, overriding its default rate if a delay is specified.

🔍 Detection Indicators

The primary User‑Agent string is Cheesebot/1.0, sometimes with additional version suffixes like Cheesebot/1.1 (compatible; CheeseCrawler 2.0; +https://cheese.com/crawler). The bot also sends a custom X‑Cheese‑ID header containing a unique crawl session identifier. Behavioral fingerprints include a lack of JavaScript execution and a fixed pattern of requesting robots.txt first, then following internal links in breadth‑first order.

📊 Data Usage

Collected data—product names, prices, images, and structured metadata—is used solely to populate the Cheese Price Comparison Engine. According to the Cheese Privacy Policy (cheese.com/privacy), no personal information is intentionally harvested, and raw data is discarded after 30 days. The aggregated results are stored in a search index that is refreshed every 12 hours, enabling real‑time price comparison for consumers.

⚙️ Rate Limiting Policy

Cheesebot is rate‑limited because its crawl frequency, while polite, can still overwhelm smaller servers if left unchecked. The standard threshold for blocking is >50 requests per minute from a single IP, with a temporary 60‑second “cooldown” enforced, after which the bot automatically backs off—this policy is documented in the Cheese Crawler Best Practices guide at cheese.com/crawler/rate‑limiting.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.