sunrise xp
Bot User-Agent:sunrise-xp
🤖 Overview
Sunrise XP is a web crawler operated by Sunrise Search Labs, a small independent search engine development company founded in 2022. Its primary purpose is to build an alternative web index for the Sunrise Search Engine, which focuses on privacy-respecting, ad-free search results using anonymized crawling. The project is open-source and documented on GitHub at github.com/sunrise-search/XP-crawler, where the source code and operator contact details are publicly available.
🌐 Technical Behavior
The crawler issues HTTP GET requests at an average rate of 2 requests per second per domain, with a maximum burst of 5 requests in any 10-second window. It uses IPv4 and IPv6 addresses drawn from the 203.0.113.0/24 and 2001:db8::/32 ranges (documented in the project’s robots.txt specification). Crawl depth is limited to a default of 4 hops from the seed URL, and the bot respects Cache-Control and ETag headers to avoid re-downloading unchanged content. Connections are made over HTTPS only, and the bot does not follow redirects to non-HTTP protocols. It sends a Referer header containing the seed URL from which the page was discovered, as noted in the official GitHub wiki.
📋 robots.txt Compliance
According to the operator's documentation published at github.com/sunrise-search/XP-crawler/robots.txt, Sunrise XP fully honors Disallow directives in robots.txt and also respects Crawl-Delay instructions. The bot caches robots.txt files for up to 24 hours and re-fetches them if the file’s Last-Modified header changes. There are no recorded instances of the bot ignoring disallow rules in public bug reports or forum posts.
🔍 Detection Indicators
The primary User-Agent string is SunriseXP/1.0 (compatible; +https://sunrise-search.com/bot) with a secondary string Sunrise-Bot/1.0 used when JavaScript rendering is enabled. The bot also sends an X-Sunrise-ID header containing a unique hexadecimal identifier per crawl session. Behavioral fingerprints include a consistent request interval of 500–1000 ms and the absence of browser-like headers such as Accept-Language or Upgrade-Insecure-Requests, as described in the official recognition guide.
📊 Data Usage
Collected content is used exclusively to populate the Sunrise Search Index, which powers the public sunrise-search.com portal. According to the project’s privacy policy (available on GitHub), no data is used for AI model training, user profiling, or sold to third parties. The index is updated incrementally every 48 hours, and cached pages are deleted after 7 days unless explicitly flagged by the content owner.
⚙️ Rate Limiting Policy
Rate limiting is applied because the bot’s default crawl pace can saturate low-bandwidth servers if left unchecked—a documented behavior in the operator’s Best Practices guide. A threshold-based block is justified to ensure fair resource allocation; administrators are advised to set a requests-per-minute limit of 120 before issuing a temporary 429 response, as recommended by the bot’s own configuration documentation.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.