searchblox

Search Engine User-Agent: searchblox

🤖 Overview

Searchblox is a legitimate enterprise web crawler operated by Searchblox, Inc., a company founded in 2004 that specializes in AI‑powered enterprise search and content indexing solutions. The crawler’s primary purpose is to collect publicly accessible web content from customer‑specified domains and feed that data into the Searchblox Search Engine, a self‑hosted or cloud‑based platform used by organizations for internal knowledge discovery, compliance scanning, and custom search applications. According to the official Searchblox documentation at https://www.searchblox.com/documentation/crawler.html, the bot is designed to obey standard web crawling protocols and is not associated with any malicious activity.

🌐 Technical Behavior

Searchblox operates as a configurable web crawler that follows hyperlinks within allowed domains, respecting robots.txt directives and crawl delay settings. The crawler supports both HTTP/1.1 and HTTPS protocols, and it can handle JavaScript‑rendered content through an optional headless browser mode, though the default mode fetches static HTML. Request frequency is controlled by the customer’s crawl configuration, but typical default intervals are 1–5 seconds between requests to the same host, as documented in the Searchblox admin guide at https://www.searchblox.com/documentation/crawl-settings.html. The crawler does not use a fixed public IP range; instead, it originates from the customer’s own server IP or from Searchblox’s cloud infrastructure, which can vary. In cloud deployments, the IP addresses are drawn from major cloud providers (e.g., AWS, Google Cloud, Azure) and are not published as a static list.

📋 robots.txt Compliance

The Searchblox crawler fully honors the robots.txt exclusion standard, as stated in the official crawling policies at https://www.searchblox.com/documentation/robots.txt. It reads the disallow directives before every crawl job and will not access URLs or directories that are explicitly forbidden. The crawler also respects Crawl‑Delay directives, throttling its request rate to the value specified in the robots.txt file, with a minimum default of one second when no delay is provided.

🔍 Detection Indicators

The primary identifier for the Searchblox crawler is the User‑Agent string Searchblox/1.0 (or Searchblox/2.0 in newer versions). A typical full user‑agent appears as Mozilla/5.0 (compatible; Searchblox/1.0; +http://www.searchblox.com/robot) or Searchblox/2.0 (+http://www.searchblox.com/robot). Additionally, the crawler sends a custom X‑Searchblox‑Crawler header set to 1 when enabled, as noted in the Searchblox support forum at https://support.searchblox.com. Reverse DNS lookups on its IP addresses often resolve to hostnames containing “searchblox” or “crawler” subdomains.

📊 Data Usage

Data collected by the Searchblox crawler is used exclusively for internal enterprise search indexing and AI‑powered content classification within the customer’s Searchblox instance. The platform extracts text, metadata, and document structure to build a searchable index, and can also feed indexed content into machine learning models for entity extraction, sentiment analysis, or recommendation features. No collected data is shared with third parties or used to train external models, per the Searchblox privacy policy at https://www.searchblox.com/privacy.

⚙️ Rate Limiting Policy

Although legitimate, the Searchblox crawler is rate‑limited because its request patterns can be misconfigured by customers to generate high traffic volumes, potentially impacting server performance. The recommended policy is to implement threshold‑based blocking (e.g., more than 10 requests per second from the same IP) while still allowing the crawler to operate under normal crawl‑delay settings, ensuring fair access for all web users without disrupting the crawler’s indexing tasks.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.