entitycubebot
Bot User-Agent:entitycubebot
🤖 Overview
EntityCubeBot is a web crawler operated by EntityCube Inc., a company specializing in entity-oriented search and knowledge graph construction. The bot collects publicly accessible web content to build structured entity profiles, feeding into the EntityCube search engine which presents fact cards and relationship maps for people, places, organizations, and concepts. It was first documented in 2021 and is primarily used to power entity disambiguation and semantic search features.
🌐 Technical Behavior
EntityCubeBot performs both recursive and focused crawling, following hyperlinks while prioritizing pages rich in biographical, organizational, and geographical data. It typically issues requests at a rate of one to two per second per domain, spaced with random delays to reduce server load. The crawler operates from IP ranges registered to EntityCube’s infrastructure, which are publicly listed in the whois records for the netblock 198.51.100.0/24 (example range; actual ranges vary). It uses HTTP/1.1 and supports gzip compression, sending a standard Accept header requesting text/html,application/xhtml+xml. The bot does not execute JavaScript, focusing solely on static HTML content and structured data markup (e.g., JSON-LD, Microdata).
📋 robots.txt Compliance
According to the official documentation on EntityCube’s developer portal (entitycube.com/crawler), EntityCubeBot fully honors Disallow directives in robots.txt and respects a crawl-delay directive if specified. The crawler checks for a robots.txt file before each new domain visit and caches it for up to 24 hours. There are no known violations or complaints from webmasters regarding non‑compliance.
🔍 Detection Indicators
The primary User‑Agent string is EntityCubeBot/1.0 (compatible; +https://entitycube.com/bot). The bot also includes a From header with a contact email address for abuse reports, and a X‑EntityCube‑Id header containing a unique crawl session identifier. Behavioral fingerprints include sequential URL requests with monotonically increasing timestamps and a consistent lack of referrer headers from external sites.
📊 Data Usage
Collected data is used solely for populating EntityCube’s entity index and knowledge graph. Text, metadata, and structured data are extracted to generate entity summaries, attribute tables, and relationship edges. The data is not used for AI model training, advertising, or third‑party resale, as stated in the company’s privacy policy (entitycube.com/privacy).
⚙️ Rate Limiting Policy
Despite its legitimacy, EntityCubeBot is rate‑limited under standard threat detection rules because it can send sustained bursts during initial site discovery. A threshold of 50 requests per minute from its IP range is reasonable to prevent resource exhaustion while allowing the bot to complete its crawl cycles within the agreed polite crawl‑delay.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.