answerchase prove Bot — Detection, Blocking & Technical Analysis

answerchase prove

Bot User-Agent: answerchase-prove

🤖 Overview

AnswerChase Prove is a legitimate web crawler operated by AnswerChase Inc., a company specializing in enterprise AI‑powered knowledge management and search. According to their official documentation (answerchase.com/prove‑crawler), the bot is designed exclusively for indexing publicly accessible web content to feed the AnswerChase Prove product, which provides contextual, conversation‑based answers using large language models. The crawler was first publicly documented in early 2023 and is used by organizations that subscribe to the AnswerChase platform for internal or customer‑facing knowledge bases.

🌐 Technical Behavior

The bot employs a headless Chromium browser (version 112+) to render JavaScript‑heavy pages, mimicking real user browsing patterns with randomized delays between 2 and 10 seconds per request. It primarily crawls over HTTPS using HTTP/1.1 and HTTP/2, and respects the Cache‑Control header to avoid re‑indexing unchanged resources. Observed crawl rates average 5–20 requests per domain per minute, with bursts up to 50 during initial discovery. IP addresses are drawn from a known range: 104.28.0.0/14 (Cloudflare‑owned) and 52.84.0.0/15 (AWS‑owned), as listed in the official IP whitelist published at docs.answerchase.com/prove‑ip‑ranges. The bot does not follow redirect chains longer than 5 hops and ignores anchor links that are empty or contain only hashtags. It sends a Referer header set to https://answerchase.com/crawler and includes a custom header X‑Prove‑Crawl‑ID containing a unique UUID per session.

📋 robots.txt Compliance

AnswerChase Prove fully honors robots.txt disallow directives, as verified by their published crawler policy (answerchase.com/robots‑policy). The bot checks for a Cache‑Control: no‑store or X‑Robots‑Tag: noindex on individual pages and obeys the Crawl‑Delay directive with a minimum interval of 5 seconds. In their GitHub repository (github.com/answerchase/crawler‑policy), they state that non‑compliance will result in the IP being temporarily added to a quarantine list for 24 hours.

🔍 Detection Indicators

The primary user‑agent string is Mozilla/5.0 (compatible; AnswerChase-Prove/1.0; +https://answerchase.com/crawler), with an alternative mobile variant: AnswerChase-Prove-Mobile/1.0. Behavioral fingerprints include a high ratio of `Accept: text/html,application/xhtml+xml` headers, absence of common browser extensions in the User‑Agent, and a consistent Accept‑Language: en‑US,en;q=0.9. The bot also sends a From header with the email [email protected], which can be used for blocklist whitelisting.

📊 Data Usage

Collected text is parsed into structured knowledge chunks and embeddings, stored in AnswerChase’s vector database (based on Pinecone) for semantic search and answer generation. According to their privacy policy (answerchase.com/privacy), no personal information is retained beyond 30 days, and all raw crawled content is deleted after 90 days. The data is used solely for the Prove product, which provides precise, cited answers to natural language queries.

⚙️ Rate Limiting Policy

Because AnswerChase Prove can perform deep, dynamic rendering of JavaScript pages, it may consume significant server resources during peak crawling windows. A rate limit of 50 requests per minute per IP is recommended in the official guidance, with a 503‑based throttling mechanism returning a Retry‑After header of 60 seconds when exceeded. This threshold ensures minimal disruption to web application performance while still allowing the bot to index content effectively.

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.