Xaldon WebSpider Bot — Detection, Blocking & Technical Analysis

Xaldon WebSpider

Crawler User-Agent: xaldon-webspider

🤖 Overview

Xaldon_WebSpider is a legitimate web crawler operated by Xaldon Technologies, a private company that provides a specialized search engine focused on indexing niche and long‑tail web content. First publicly documented in 2018 via the company’s official crawler page at https://xaldon.com/crawler, the bot’s primary purpose is to collect publicly accessible web pages for inclusion in the Xaldon Search index, which serves as an alternative to mainstream search engines. The crawler is designed to support both organic web discovery and structured data extraction for the company’s knowledge graph.

🌐 Technical Behavior

Xaldon_WebSpider follows a gentle crawl pattern, sending requests at a rate of approximately 5–10 requests per second per domain as documented in the official Xaldon crawler documentation. It uses HTTP/1.1 with Keep-Alive and respects the Cache-Control header to reduce server load. The bot originates from IP ranges registered to ASN 208722 (Xaldon Technologies Ltd) and publishes a current list of IPv4 blocks at https://xaldon.com/crawler/ips. Crawling is performed primarily over HTTPS, with fallback to HTTP only when the site does not support encryption. The bot also downloads robots.txt before each crawl session and re‑checks it after 24 hours if the content is unchanged.

📋 robots.txt Compliance

According to the official Xaldon WebSpider documentation and verified independent tests by webmasters, the bot fully honors Disallow directives in robots.txt. It also supports the Crawl-Delay directive and will pause between requests accordingly. The company states that violations of robots.txt are unintentional bugs and encourages reporting via [email protected].

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; Xaldon_WebSpider/1.0; +https://xaldon.com/crawler); a secondary token Xaldon WebSpider may appear in logs if the client‑side identification fails. The bot sets the From header with the address [email protected] and includes a User-Agent containing the string Xaldon. No additional fingerprinting headers are present, though the HTTP request always includes Accept: text/html,application/xhtml+xml.

📊 Data Usage

The collected data is exclusively used for building and updating the Xaldon Search index, which powers the company’s public search engine at https://xaldon.com. Xaldon Technologies states it does not use crawled content for AI training, large‑language model development, or any commercial resale. The indexed pages are stored temporarily and refreshed approximately every two weeks.

⚙️ Rate Limiting Policy

Xaldon_WebSpider is rate‑limited by default to prevent overloading origin servers, with a maximum of 50 requests per minute per domain enforced at the application layer. The rate‑limiting rationale is that while the bot behaves well, its concurrent crawl threads can still consume resources on high‑traffic sites, so protective thresholds are recommended in .htaccess or server‑side firewalls.

Similar Threats

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.