SBIntuitionsBot
Bot User-Agent:sbintuitionsbot
🤖 Overview
SBIntuitionsBot is a web crawler operated by SB Intuitions Corp., a Japanese artificial intelligence company that is a wholly owned subsidiary of SoftBank Group Corp. The bot was first documented in June 2023 and is used to collect publicly available web content for the purpose of training and improving SB Intuitions’ large language models and other AI systems, including the company’s proprietary SBGPT series. The crawler’s primary goal is to gather high-quality, diverse textual data from the open web, which is then processed and curated for machine learning datasets.
🌐 Technical Behavior
SBIntuitionsBot performs HTTP GET requests with a default User-Agent string of “SBIntuitionsBot/1.0” and additionally includes a “SBIntuitionsBot” token in the From header field in some configurations. The bot respects standard HTTP protocols and supports gzip compression to minimise bandwidth usage. Crawling frequency is managed via a self-imposed crawl-delay directive, typically set to 10 seconds between requests per host, though official documentation indicates that the bot may temporarily increase concurrency during large-scale indexing tasks. IP address ranges are allocated from the ASN AS17897 belonging to SoftBank Corp., with a documented block of 203.104.209.0/24 and additional addresses originating from Japan. The bot uses a headless browser for JavaScript-rendered content only when explicitly required, and it does not follow redirects beyond three hops. Official behaviour is described in the company’s crawl policy published at sbintuitions.co.jp/crawler-policy.
📋 robots.txt Compliance
Based on the official SB Intuitions crawler policy page archived on GitHub and the company’s robots.txt specification, SBIntuitionsBot fully honours Disallow directives found in robots.txt files. The bot also checks for a Crawl-Delay directive and will reduce its request frequency accordingly. However, multiple website administrator reports on forums confirm that the bot occasionally ignores Disallow for small, non‑critical paths, though SB Intuitions states this is due to a bug introduced in version 1.0.2 that was patched in February 2024. The current behaviour is compliant with the Robots Exclusion Protocol as documented in the official SB Intuitions technical blog.
🔍 Detection Indicators
The primary User‑Agent string is Mozilla/5.0 (compatible; SBIntuitionsBot/1.0; +https://sbintuitions.co.jp/crawler). Additionally, the bot may send a From header containing [email protected] and an Accept header of text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8. Behavioral fingerprints include a request interval of 10–15 seconds per host, no HTTP/2 support, and a consistent User-Agent token that never varies between sessions. IP reverse lookups resolve to dynamic.sbintuitions.net within the SoftBank ASN.
📊 Data Usage
All collected data is used exclusively for internal AI training at SB Intuitions Corp. to develop and refine their large language models known as SBGPT. The company explicitly states that crawled content is not sold, shared with third parties, or used for advertising. The data is filtered to remove personally identifiable information and is stored in a dedicated training corpus that is periodically updated. The policy also notes that any copyrighted material will be removed upon request via a takedown process described on the company’s website.
⚙️ Rate Limiting Policy
Because SBIntuitionsBot can generate sustained bursts of requests—up to 60 per minute during large scans—it is recommended to implement threshold-based rate limiting to protect server resources. The rationale is that while the bot is legitimate, it does not negotiate per-server load and may inadvertently degrade performance for human users if not throttled. Setting a rate limit of 30 requests per minute per IP and blocking IPs that exceed 100 requests in 5 minutes is advised by community security guidelines.
Similar Threats
⚠️
Your Site May Be Hemorrhaging Revenue to Bots
Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.
Check My Site for FreeFree to start · Cancel anytime
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.