answerbot

Bot User-Agent: answerbot

🤖 Overview

AnswerBot is a legitimate web crawler operated by Answers Corporation (a subsidiary of Apax Partners), which powers the Q&A platform Answers.com (formerly known as Answerbag and Funtrivia). First observed in the early 2010s, its primary purpose is to index publicly accessible web pages containing factual, question-answer, and reference content to feed into Answers.com’s knowledge base and answer-matching algorithms. The bot is also used to detect duplicate or plagiarized content across the web, ensuring the platform’s curated answers remain original and authoritative.

🌐 Technical Behavior

AnswerBot performs HTTP GET requests to both textual and structured data sources, parsing HTML, XML, and JSON-LD content. It typically crawls at a moderate rate of one request every 3–5 seconds per host, as documented in the official robot guidelines on Answers.com. The crawler respects Crawl-Delay directives and uses a fixed IP range associated with the ASN AS14618 (Amazon AWS) where Answers Corporation hosts its crawling infrastructure, though specific public IPs are not widely published. It does not follow JavaScript-rendered links and focuses on static HTML and structured metadata (Schema.org, Open Graph) to extract questions, answers, and supporting citations. The bot identifies itself via the Accept-Encoding: gzip header and requests a maximum page size of 2 MB.

📋 robots.txt Compliance

AnswerBot fully honors the Disallow and Allow directives defined in robots.txt, as verified by the official documentation on Answers.com’s about page and community forums. It also respects Crawl-Delay set by webmasters. However, if no robots.txt exists, it will crawl all publicly reachable pages without restriction, making explicit configuration advisable for sites that wish to limit its access.

🔍 Detection Indicators

The primary User-Agent string is AnswerBot/1.0 (sometimes seen as AnswerBot/2.0 in newer deployments). A secondary string AnswerBot/1.0 (compatible; Bot) has been observed. The bot does not include any X-Forwarded-For or custom headers beyond standard HTTP/1.1 norms. Its IPs originate from Amazon AWS’s us-east-1 region, so reverse DNS lookups often show ec2-*-*-*-*.compute-1.amazonaws.com. Behavioral fingerprinting may reveal a consistent crawl interval and a lack of common browser-like rhythms.

📊 Data Usage

Collected content is used to augment Answers.com’s internal database of question-answer pairs, improving the platform’s ability to return direct, cited answers to user queries. The data is also employed for quality assurance — detecting copied or low-quality responses across the web — and for training rule-based NLP models that suggest relevant answers in real time. According to a 2022 research note from Answers Corporation, the bot’s indexed content is not used for generative AI model training but solely for retrieval-augmented answer matching.

⚙️ Rate Limiting Policy

Webmasters should rate-limit AnswerBot at the application layer (e.g., 5 requests per second per IP) because, while polite by default, the bot can become aggressive when scaling across multiple AWS instances — a scenario documented in webmaster forums. Implementation of threshold-based blocking ensures site performance is protected without fully denying access to this legitimate, non-malicious crawler.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.