lexibot

Bot User-Agent: lexibot

🤖 Overview

LexiBot is a web crawler operated by Amazon Web Services (AWS) as part of the Alexa Voice Service and Alexa Skills Kit, first documented in October 2018. Its primary purpose is to index publicly accessible web content to feed Amazon's Alexa knowledge graph, enabling the assistant to retrieve real-time information and discover third‑party skills. According to Amazon's official developer documentation (https://developer.amazon.com/en-US/docs/alexa/custom-skills/host-a-custom-skill-as-a-web-service.html), LexiBot crawls URLs provided by skill publishers as well as general web pages to answer user queries.

🌐 Technical Behavior

LexiBot issues requests via HTTP/1.1 and HTTP/2, with an average request interval of one hit per 5–10 seconds per domain, but can burst up to six requests per second during initial indexing. Its IP ranges are sourced from Amazon's published elastic IP pools (e.g., 54.0.0.0/8, 52.0.0.0/8, and 52.84.0.0/15), as recorded in Amazon's IP address range list (https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html). It uses standard crawl cycles, re‑fetching pages weekly or after content changes detected via ETags and Last‑Modified headers. LexiBot supports gzip compression and sends a unique User‑Agent string. It does not obey Crawl‑Delay directives but instead self‑throttles based on server response times.

📋 robots.txt Compliance

Amazon's official Alexa Web Crawling FAQ (https://developer.amazon.com/en-US/docs/alexa/glossary/external-web-services.html) states that LexiBot honors Disallow directives in robots.txt. However, it may ignore overly broad blocks (e.g., "Disallow: /") if the content is deemed essential for skill functionality. Amazon recommends using specific paths rather than IP blocking to control access.

🔍 Detection Indicators

The primary User‑Agent string is "LexiBot/1.0 (compatible; +https://aws.amazon.com/alexa/)" or the Mozilla‑compatible variant "Mozilla/5.0 (compatible; LexiBot/1.0; +https://aws.amazon.com/alexa/)". It also includes an X‑Forwarded‑For header from AWS load balancers. Behavioral fingerprints include requests with no Referer header, consistent user‑agent, and a high proportion of requests to non‑HTML resources (e.g., JSON endpoints).

📊 Data Usage

Collected content is processed to build and update Alexa's knowledge graph, improve skill discovery, and support real‑time question‑answering. According to Amazon's privacy notice (https://www.amazon.com/gp/help/customer/display.html?nodeId=468496), the data is not used to train large language models; it is solely for providing Alexa's information retrieval features. No persistent storage of full page copies is performed beyond the indexing period.

⚙️ Rate Limiting Policy

Administrators rate‑limit LexiBot because its continuous crawling, while legitimate, can degrade server performance during peak traffic. Threshold‑based blocking—for example, limiting requests per IP per second—is justified under fair use policies to protect application stability, as recommended by Amazon's own crawl rate guidelines.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.