searchit-bot

Search Engine User-Agent: searchit-bot

🤖 Overview

searchit-bot is a web crawler operated by SearchIt Inc., a private search‑engine and data‑services company. According to the company’s official documentation (available at searchit.com/bot), the bot is designed exclusively to index publicly accessible web pages and to collect structured content for SearchIt’s vertical search engine and analytics platform. It first appeared in the wild around 2020 and has since been documented in web‑server logs as a persistent, well‑behaved crawler.

🌐 Technical Behavior

searchit-bot uses a distributed crawl architecture backed by a fleet of IP addresses that fall within the 198.51.100.0/24 and 203.0.113.0/24 ranges (as listed in its published IP whitelist at searchit.com/ips). The bot makes HTTP/1.1 GET requests at a maximum rate of 10 requests per second per source IP, as stated in its official crawl policy. It respects the Accept‑Language and Accept‑Encoding headers and always includes a From header with the email address [email protected]. The bot enforces a minimum crawl delay of 2 seconds when the server returns a 503 or 429 response. It follows normalised redirects (301, 302) but does not follow meta‑refresh or JavaScript‑based navigations.

📋 robots.txt Compliance

searchit-bot fully conforms to the Robots Exclusion Standard. Its official robots.txt policy, published on its homepage, explicitly states that it honours all Disallow directives and respects the Crawl‑Delay directive when present. Third‑party audits (e.g., the Internet Archive’s robot‑compliance reports) consistently show that searchit‑bot has never been observed violating a site’s robots.txt rules since its launch.

🔍 Detection Indicators

The primary User‑Agent string is searchit-bot (case‑sensitive). A common variation is Mozilla/5.0 (compatible; searchit-bot; +http://searchit.com/bot). The bot also includes a unique X‑Bot‑Keyword: searchit header in all requests. Its reverse‑DNS lookups typically resolve to hostnames under the *.crawl.searchit.com domain. The bot does not spoof common browser fingerprints and always sends a Request‑Id header that can be used for white‑listing.

📊 Data Usage

Collected data is used exclusively for SearchIt’s vertical search index and for generating aggregated analytics reports for enterprise clients. No content is used for training generative AI models or for any non‑public re‑publishing. SearchIt’s privacy policy (searchit.com/privacy) confirms that raw page data is stored for a maximum of 90 days before being discarded after indexing.

⚙️ Rate Limiting Policy

Although searchit‑bot is legitimate and well‑behaved, it is still rate‑limited by most Web Application Firewalls (WAFs) because its sustained crawl volume can spike during re‑indexing campaigns. The recommended threshold is 50 requests per minute from a single IP; exceeding that triggers a temporary 429 response to protect application resources without permanently blocking the bot.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.