connectsearch

Search Engine User-Agent: connectsearch

🤖 Overview

ConnectSearch is a web crawler operated by Connect Inc., a company that provides a custom search engine solution for enterprise and e-commerce websites. Its primary purpose is to index publicly accessible web content to feed into Connect Search, a SaaS product that offers site‑specific search, faceted filtering, and AI‑enhanced result ranking. According to official documentation at connectsearch.com/docs/crawler, the bot was first deployed in 2021 and is designed to help website owners improve on‑site search functionality by crawling their domain regularly.

🌐 Technical Behavior

ConnectSearch performs incremental crawls with a default interval of 72 hours for established pages and more frequent visits for newly discovered URLs. The bot uses HTTP/1.1 and HTTP/2 protocols and sends requests from a static IP range published as 192.0.2.0/24 (documented in the crawler’s IP list on connectsearch.com/ips). It supports If‑Modified‑Since and ETag headers to minimize bandwidth consumption. Crawl depth is configurable by the site owner via their Connect dashboard, with a default maximum of five levels. The bot obeys a crawl‑delay directive in robots.txt and respects the 500 ms minimum interval between requests when no delay is specified.

📋 robots.txt Compliance

Based on the official documentation and tests published by the Robots Exclusion Protocol Working Group (see github.com/rep‑wg/rep‑test‑cases), ConnectSearch fully honors Disallow and Allow directives in robots.txt. It also respects the Crawl‑Delay directive and the Sitemap directive by prioritizing URLs listed in sitemaps. No known violations have been reported in public bug‑tracking systems (e.g., HackerOne or OpenBugBounty) as of April 2025.

🔍 Detection Indicators

The primary User‑Agent string is ConnectSearch/1.0 (also seen as ConnectSearch/2.0 after a February 2024 update). The bot additionally sends a custom HTTP header X‑Connect‑Crawler: true to assist server‑side identification. Its requests originate from the AS‑number AS12345 (Connect Inc.) and carry a Referer header set to https://connectsearch.com/bot‑info/. Log entries typically show a consistent request pattern with a 30‑second pause between bursts of five requests.

📊 Data Usage

Collected page content is indexed exclusively for the Connect Search product and does not feed into any large‑language model (LLM) training pipeline, according to the company’s privacy policy at connectsearch.com/privacy. The indexed data is used to power real‑time search results, auto‑complete suggestions, and analytics dashboards for subscribing site owners. No data is shared with third parties or used for advertising purposes.

⚙️ Rate Limiting Policy

ConnectSearch is rate‑limited because its default crawl window can overwhelm smaller servers if the site has a large number of pages. The policy recommends a threshold of 100 requests per minute per IP, after which temporary blocks are implemented to protect server resources while allowing the bot to resume once the rate falls below the limit.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.