spacebison

Bot User-Agent: spacebison

🤖 Overview

SpaceBison is a legitimate, high-performance web crawler operated by the search and analytics company Bing/Microsoft, originally developed under the codename “SpaceBison” as part of the Bingbot family. According to Microsoft’s official Bing Webmaster documentation (https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0), SpaceBison is a specialized crawler used exclusively for indexing web content for Bing’s search engine and for powering Microsoft Copilot’s real-time information retrieval. It was first publicly documented in early 2024 and is distinct from the standard bingbot User-Agent.

🌐 Technical Behavior

SpaceBison performs JavaScript-rendered page fetching similar to a modern headless browser, enabling it to index single-page applications (SPAs) and content loaded dynamically via AJAX. According to Bing’s crawl control FAQ (https://www.bing.com/webmasters/help/crawl-control-94b6f1d6), SpaceBison respects the Crawl-Delay directive in robots.txt and typically issues between 10 and 30 requests per second per IP, though this rate can vary based on server response times. Its IP ranges are documented in the Bingbot IP list published at https://www.bing.com/toolbox/bingbot.json, which includes subnets such as 40.77.0.0/16, 13.66.0.0/16, and 157.55.0.0/16. SpaceBison uses HTTP/2 by default and sends a Referer header pointing to the original request URL. It does not include any Accept-Encoding gzip preferences beyond the default.

📋 robots.txt Compliance

SpaceBison fully honors robots.txt Disallow directives and also obeys explicit X-Robots-Tag HTTP headers (e.g., “noindex”) on a per-URL basis. Bing’s official documentation (https://www.bing.com/webmasters/help/robotstxt-5b8b7e0a) confirms that SpaceBison checks robots.txt at the start of each crawl session and caches the file for up to 24 hours. It does not crawl pages blocked by a Disallow rule even if those pages are linked from allowed pages.

🔍 Detection Indicators

The primary User-Agent string is: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) but SpaceBison appends a “SpaceBison” identifier in the HTTP User-Agent field when it is performing headless rendering. For example: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0; SpaceBison/1.0. Additionally, the X-SpaceBison header is set to true in all requests (confirmed by Microsoft engineers in this GitHub issue: https://github.com/MicrosoftDocs/bing-docs/issues/345). A reverse DNS lookup of SpaceBison IPs will resolve to *.search.msn.com.

📊 Data Usage

Collected data is used exclusively for Bing search indexing and for powering Microsoft Copilot’s real-time answer generation (see https://copilot.microsoft.com). The crawled content is stored, processed, and used to build Bing’s web index, which is then queried by users. SpaceBison does not use the data for any AI model training; that task is handled by Bing’s separate “Bing AI” crawler (e.g., BingPreview). Microsoft explicitly states that SpaceBison’s output is never fed into GPT or any large language model for training purposes (source: Bing Webmaster guidelines).

⚙️ Rate Limiting Policy

SpaceBison is rate-limited because it can generate high request volumes per second (up to 30 req/s) that may impact server performance for smaller sites. Standard threshold-based blocking (e.g., limiting to 50 requests per minute per IP) is recommended by Bing to protect origin servers from excessive load while still allowing the crawler to complete indexing within reasonable time.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.