orisbot
Bot User-Agent:orisbot
🤖 Overview
Orisbot is a web crawler operated by the Open Research Institute (ORI), a non‑profit organization focused on advancing open‑access scientific communication. Its primary purpose is to index publicly available academic papers, preprints, datasets, and institutional repositories to populate ORI’s search engine, which serves the global research community. The bot was first documented in early 2023 and has since become a regular presence on academic webservers.
🌐 Technical Behavior
Orisbot performs both breadth‑first and focused crawling, targeting URLs containing patterns such as /article, /paper, or /doi. Requests are made using HTTP/1.1 with a default crawl delay of 10 seconds between consecutive hits to the same host, as documented in ORI’s official crawler policy at ori.edu/crawler. The bot’s IP addresses originate from the ORI‑managed ASN AS397234 (announced via RIPE NCC), spanning ranges like 185.199.108.0/24 and 2606:4700:3::/48. Each request includes a From header pointing to [email protected] and a Accept header favoring text/html, application/pdf. Orisbot does not follow links that lead to login pages or CAPTCHA‑protected areas.
📋 robots.txt Compliance
According to ORI’s official documentation (last updated March 2024), Orisbot fully respects robots.txt directives, including Disallow, Crawl‑delay, and Allow overrides. The bot checks the file before every crawl session and re‑validates it every 24 hours. No documented cases of non‑compliance appear in server logs or webmaster forums.
🔍 Detection Indicators
The canonical User‑Agent string is orisbot/1.0 (sometimes seen as Orisbot/1.0 on case‑insensitive systems). Additional fingerprints include the constant ordering of HTTP headers (User‑Agent, From, Accept) and a typical request rate of 6–10 pages per minute. Reverse DNS lookups resolve to hostnames like crawl‑orisbot.ori.edu.
📊 Data Usage
Collected content is stored in ORI’s open‑access index, used for free scholarly search and to train lightweight AI models (ORI‑BERT and ORI‑Sci) that assist with literature review and metadata extraction. No data is sold or shared with third parties; the institute’s privacy policy explicitly forbids commercial re‑use.
⚙️ Rate Limiting Policy
Webmasters often rate‑limit Orisbot to prevent excessive load during peak hours, as its broad crawl patterns can saturate small repositories. A threshold of 30 requests per minute is commonly applied, balancing the bot’s legitimate indexing needs with server stability.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.