amazon-kendra
Bot User-Agent:amazon-kendra
🤖 Overview
Amazon Kendra is a fully managed intelligent search service provided by Amazon Web Services (AWS), officially launched in November 2019. Its purpose is to surface relevant information from unstructured data across enterprise repositories, documents, and websites. The product feeds into AWS Kendra’s search indexes, enabling natural language querying for corporate knowledge bases and intranets. As part of data ingestion, Kendra operates a dedicated web crawler that indexes publicly available web content when configured by customers.
🌐 Technical Behavior
The Amazon Kendra web crawler performs HTTP/HTTPS requests using a custom bot agent. According to the AWS documentation (https://docs.aws.amazon.com/kendra/latest/dg/crawler.html), the crawler respects standard crawling protocols and uses the AmazonKendra User-Agent string. It originates from AWS-owned IP address ranges published in the AWS IP Address Ranges JSON (https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html). The crawler’s default crawl rate is governed by per‑source quotas – for web sources, Kendra can ingest up to 200 pages per minute per data source (as per AWS service limits). Crawl depth and frequency are configurable through the Kendra console; by default, it follows robots.txt directives and respects nofollow and noindex tags. The bot supports both IPv4 and IPv6 requests.
📋 robots.txt Compliance
Amazon Kendra officially honors robots.txt Disallow directives as documented in the AWS Kendra Developer Guide. The crawler checks the file before each crawl session and will not index URLs or paths explicitly forbidden. This behavior is verified by AWS support documentation and community reports (e.g., AWS Knowledge Center articles). There is no known evidence of the bot ignoring robots.txt rules.
🔍 Detection Indicators
The primary User-Agent string is AmazonKendra (case‑sensitive). Additional identifying headers include X-Amz-User-Agent: aws-kendra and a standard User-Agent: AmazonKendra/1.0 (aws; aws-kendra) variant. The bot also sends a From header (optional) with the customer’s configured email. Behavioral fingerprints include sequential, non‑bursty request patterns and a consistent crawl delay that respects the site’s Crawl-delay directive in robots.txt.
📊 Data Usage
Collected data is used exclusively for building and updating Amazon Kendra search indexes for the customer who configured the crawler. The service indexes the textual content of crawled pages to enable natural language search queries. No data is used for external AI training, advertising, or any purpose outside the customer’s Kendra instance. AWS’s Data Processing Addendum governs data handling; content is encrypted at rest and in transit.
⚙️ Rate Limiting Policy
Amazon Kendra’s crawler is rate‑limited to prevent excessive load on source servers; the default maximum is 200 pages per minute per data source. This threshold‑based blocking policy ensures the crawler does not overwhelm smaller sites while still allowing efficient indexing for enterprise‑scale deployments. Administrators can further reduce the rate via Kendra’s console.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.