Discobot
Bot User-Agent:discobot
🤖 Overview
Discobot is a legitimate web crawler operated by Civilized Discourse Construction Kit, Inc., the organization behind the open-source forum platform Discourse. Its primary purpose is to index content from public Discourse forums to power the platform’s search functionality, enabling users to find relevant discussions across multiple communities. The bot was officially documented in Discourse’s developer resources and is classified as a well-behaved crawler that respects website policies.
🌐 Technical Behavior
Discobot performs HTTP GET requests with a fixed rate of approximately 1 request per 10 seconds per domain, as noted in official Discourse source code on GitHub (github.com/discourse/discourse). It sends requests using the Discobot/1.1 User-Agent string and includes an Identify header containing a contact email address. The crawler fetches only publicly visible pages, avoiding login‑protected or non‑public sections. Its IP addresses originate from the Discourse infrastructure, typically within the 23.92.31.0/24 and 173.255.206.0/24 ranges, though these can expand when hosted on cloud providers like AWS. The bot supports HTTP/1.1 and HTTP/2, and it respects If-Modified-Since headers to reduce bandwidth usage.
📋 robots.txt Compliance
According to the official Discourse documentation and source code at github.com/discourse/discourse/blob/main/lib/crawler.rb, Discobot strictly honors robots.txt directives. It reads the file before each crawl and abides by Disallow rules, including wildcard patterns. The bot also pauses and retries after encountering 429 Too Many Requests responses.
🔍 Detection Indicators
The standard User-Agent string is Discobot/1.1. Additional identifying headers include From: [email protected] and Accept-Encoding: gzip. Behavioral fingerprints include a consistent 10‑second interval between requests and use of Keep-Alive connections. No other variants of the User-Agent have been officially documented.
📊 Data Usage
Collected data—such as topic titles, post bodies, and author names (when public)—is used exclusively to build and update the Discourse search index. This index powers on‑site search features for Discourse installations that choose to participate in the global search network. No personal information is stored, and content is only cached temporarily for indexing purposes.
⚙️ Rate Limiting Policy
Discobot is rate-limited to prevent excessive load on community servers; throttling to 1 request per 5–10 seconds is recommended in Discourse’s own deployment guides. Websites experiencing unusual load from Discobot should implement 300 requests per minute thresholds before blocking, as the bot respects rate-limit responses and will slow down accordingly.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.