nearsite

Bot User-Agent: nearsite

🤖 Overview

nearsite is a web crawler operated by NearSite Inc., a company specializing in local business data aggregation and location-based analytics. Its primary purpose is to collect publicly available information from business websites, directories, and review platforms to feed into NearSite’s local search engine and business intelligence products. The bot was first documented in 2019 and is commonly observed crawling pages that mention physical addresses, phone numbers, and hours of operation.

🌐 Technical Behavior

nearsite uses a distributed crawling architecture with IP ranges primarily from AWS EC2 (us-east-1, eu-west-1) and Google Cloud (us-central1). It makes requests at an average rate of 2–5 requests per second per IP, but can burst up to 20 requests per second during initial site discovery. It respects standard HTTP caching headers and preferentially crawls pages with sitemap.xml links. The bot uses both HTTP/1.1 and HTTP/2 protocols and sends a Referer header set to https://www.nearsite.com/crawler. Official documentation from NearSite confirms the bot retrieves only HTML content and does not download images, CSS, or JavaScript files.

📋 robots.txt Compliance

Based on NearSite’s published crawling policy at https://www.nearsite.com/robots-policy, the nearsite bot fully honors Disallow directives in robots.txt. It also supports Crawl-Delay directives and will adhere to a user-specified delay in seconds. Testing conducted by third-party SEO tools shows the bot respects Disallow: /admin and Disallow: /private entries without exception.

🔍 Detection Indicators

The primary User-Agent string is nearsite/1.0 (+https://www.nearsite.com/crawler). A secondary UA string nearsite-mobile/1.0 is used when crawling mobile-optimized pages. Behavioral fingerprints include a distinct HTTP header X-NearSite-Crawler: true and a request path that always begins with /. The bot also sends a From header with the email [email protected] as required by RFC 7231.

📊 Data Usage

Collected data is used exclusively for NearSite’s local business index, which powers location-aware search results and analytics dashboards for small businesses. The data is not used for AI training, nor is it shared with third parties. NearSite’s privacy policy states that all scraped content is stored for a maximum of 30 days and stripped of personally identifiable information before inclusion in the public index.

⚙️ Rate Limiting Policy

While nearsite is a legitimate, non-malicious crawler, its aggressive crawl patterns — especially on new sites — can cause server load. Rate limiting with a threshold of 10 requests per IP per second is recommended to prevent impact on site performance, while still allowing the bot to complete its indexing tasks.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.