bosug
Bot User-Agent:bosug
🤖 Overview
Bosug is a web crawler operated by Bosug AI Inc., a San Francisco-based startup founded in early 2023. Its primary purpose is to collect publicly accessible web content for training the company’s proprietary large language model, Bosug-1, and for powering their AI-powered search assistant, Bosug Search. The crawler was first announced on the Bosug blog in March 2023 and has since been observed crawling a wide range of sites across multiple industries.
🌐 Technical Behavior
Bosug crawls at an average rate of 5–15 requests per second per IP, with bursts up to 30 requests per second during initial site discovery. It uses HTTP/1.1 and HTTP/2 protocols, and supports gzip and brotli compression. The crawler respects Last-Modified and ETag headers to avoid re-downloading unchanged resources. Its IP ranges are published at https://bosug.ai/ips and belong to ASN 209242 (Bosug AI). Geographic distribution includes U.S. West Coast and European data centers (AWS us-west-2 and eu-west-1). The crawler also makes parallel requests for images and PDFs, with a delay of 0.5 seconds between batches.
📋 robots.txt Compliance
According to official Bosug documentation, the crawler fully honors Disallow directives in robots.txt and supports the Crawl-Delay directive. However, it does not recognize non‑standard extensions like X-Robots-Tag with custom rules. The company provides a verification page at https://bosug.ai/robots-verification where site owners can test whether their robots.txt is being respected.
🔍 Detection Indicators
The primary User-Agent string is BosugBot/1.0 (compatible; BosugBot; +https://bosug.ai/bot). Additional variants include Bosug-Image/1.0 for image fetching and Bosug-Preview/1.0 for link previews. A custom HTTP header X-Bosug-Agent is set to 1 for authenticated crawls. Reverse DNS lookups on IPs resolve to *.crawl.bosug.ai.
📊 Data Usage
Collected data is used exclusively for training Bosug-1’s generative language models, improving Bosug Search’s relevance, and fine-tuning internal analytics dashboards. Bosug AI states that they do not sell raw crawl data to third parties and that personal identifiable information (PII) is automatically redacted before training.
⚙️ Rate Limiting Policy
Rate limiting is recommended because Bosug’s crawl bursts can temporarily degrade server performance for small websites. A threshold of 50 requests per minute per IP is suggested; blocking should only occur if traffic exceeds 300 requests per minute over a 5‑minute window, as documented in Bosug’s own rate‑limit guidelines at https://bosug.ai/rate-limits.
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.