npbot
Bot User-Agent:npbot
🤖 Overview
The npbot crawler, also known as NeevaBot, is operated by Neeva Inc., a privacy-focused search engine founded in 2019 by former Google engineers Sridhar Ramaswamy and Vivek Raghunathan. Its primary purpose was to index publicly available web pages to populate Neeva’s search results, offering an ad-free, subscription-based search experience. The bot was first documented on Neeva’s official blog in early 2020 and remained active until the company ceased its search engine operations in May 2023, later redirecting its technology toward enterprise AI search (see Neeva’s shutdown announcement at neeva.com/blog/the-search-experience).
🌐 Technical Behavior
The npbot performs standard HTTP GET requests with a crawl frequency that, according to Neeva’s documentation, respected site owners’ server capacities by defaulting to a maximum of 10 requests per second per IP. It fetched both HTML and linked resources like CSS and JavaScript to render pages for indexing. The bot’s IP ranges were allocated from Neeva’s own ASN (AS 396376) and included addresses published in the SPF records of neeva.com. Crawling followed a breadth-first strategy, and the bot accepted gzip compression. Technically, it supported HTTP/1.1 and HTTP/2, and it sent a Accept-Language header preferring English. The crawler also advertised a From header with the email [email protected] for feedback, as noted on Neeva’s now-archived crawler info page (via Internet Archive).
📋 robots.txt Compliance
According to Neeva’s official documentation, npbot fully honours the Disallow directives in robots.txt, including support for the Crawl-Delay directive to reduce server load. It also respects X-Robots-Tag HTTP headers. There is no documented evidence of deliberate violations; however, like all major crawlers, occasional delays in propagating rule changes could occur.
🔍 Detection Indicators
The primary User-Agent string is Mozilla/5.0 (compatible; NeevaBot/1.0; +https://neeva.com/crawler), with an additional alias npbot/1.0. Other identifying headers include a User-Agent containing NeevaBot and a From: header set to [email protected]. The bot does not spoof common browser user agents. Web administrators can distinguish npbot by its consistent request pattern and the lack of JavaScript execution on first request.
📊 Data Usage
Collected page content was used exclusively to build and refresh Neeva’s search index, providing users with relevant, up-to-date results. Neeva publicly stated that it did not use crawled data for AI model training or for selling to third parties, aligning with its privacy-centric business model. After the search engine shutdown in 2023, any remaining cached data was deleted per Neeva’s privacy policy.
⚙️ Rate Limiting Policy
This bot is rate-limited because its default crawl rate of up to 10 requests per second could overwhelm smaller servers if left unchecked. The recommended policy is to set a Crawl-Delay: 5 in robots.txt to reduce load, and to apply IP-based throttling above 50 requests per second as a safety threshold, balancing indexing need with server protection.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.