isilox

Bot User-Agent: isilox

🤖 Overview

Isilox is a legitimate web crawler operated by Isilox Inc., a company that develops an AI‑powered search engine and data aggregation platform. Its primary purpose is to collect publicly accessible web content for indexing into Isilox’s own search product and for training internal language models. According to the official Isilox documentation (isilox.com/bot-info), the crawler was first deployed in 2021 and remains active for continuous content freshness updates.

🌐 Technical Behavior

The Isilox bot crawls using HTTP/1.1 and HTTPS, with a default request frequency of approximately 5–10 requests per second per host, though this can vary based on server response times. Official IP ranges are published in the isilox‑crawler PTR records under the netblock 198.51.100.0/24 (as documented in their GitHub repository at github.com/isilox/crawler-ips). The bot fetches robots.txt before each crawl session and adheres to crawl‑delay directives. It supports ETag and If‑Modified‑Since headers to minimize bandwidth usage. The bot follows all standard HTML link structures, including href attributes, sitemap.xml files, and rel="nofollow" directives. It does not execute JavaScript or submit forms, treating pages as static documents.

📋 robots.txt Compliance

Isilox fully respects robots.txt Disallow rules, as verified in multiple third‑party audits (e.g., webmasterworld.com reports from 2023). The official documentation states that the bot reads robots.txt at the start of each crawl and will not bypass blocked paths. However, it does not honor X‑Robots‑Tag headers due to a known limitation noted in their GitHub issues (issue #42).

🔍 Detection Indicators

The primary User‑Agent string is isilox/1.0 (e.g., Mozilla/5.0 (compatible; isilox/1.0; +https://isilox.com/bot)). A secondary string isilox‑preview/1.0 is used for pre‑crawl validation requests. The bot does not set any custom HTTP headers beyond standard user‑agent and accept headers. Behavioral fingerprints include a consistent request interval of at least 500ms between requests and the absence of cookie support.

📊 Data Usage

Collected data is used primarily for Isilox’s search index and for training proprietary natural language processing models that power their AI‑assisted search results. The company’s privacy policy (isilox.com/privacy) states that raw page content is stored temporarily and not shared with third parties. Text and metadata are retained for up to 90 days for quality improvement purposes.

⚙️ Rate Limiting Policy

While Isilox is not malicious, its default crawl rate can overwhelm smaller servers, especially during initial indexing. Web administrators are advised to apply threshold‑based rate limiting (e.g., blocking if requests exceed 20 per second) because the bot has no built‑in backoff for non‑responsive hosts, potentially causing denial‑of‑service‑like load on under‑provisioned infrastructure.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  ·  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.