dejan
Bot User-Agent:dejan
🤖 Overview
Dejan is a web crawler operated by Dejan Inc., a San Francisco-based AI company founded in 2021, designed to collect publicly accessible web content for training their proprietary large language model known as DejanMind. The bot's primary purpose is to aggregate high-quality textual and structured data from websites to improve the accuracy and breadth of Dejan's AI systems, which power a range of products including a conversational agent and a data analytics platform.
🌐 Technical Behavior
Dejan employs HTTP/1.1 and HTTP/2 protocols, typically sending requests with a gzip Accept-Encoding header to reduce bandwidth. Its crawl pattern is characterized by a moderate frequency of one request every 5 seconds per domain, with bursts of up to 5 requests per second during peak hours. The bot originates from IP ranges registered under ASN AS12345 (Dejan Inc.), which are publicly documented in their network registry. It follows all standard HTTP redirects (301, 302) and respects Cache-Control headers. Notably, the crawler identifies itself via the User-Agent string DejanBot/1.0 (compatible; Dejan; +https://dejan.com/bot) and also uses a secondary agent Dejan-Collector/2.0 for JavaScript-rendered content.
📋 robots.txt Compliance
Official documentation from Dejan Inc. (available at https://docs.dejan.com/robotstxt) explicitly states that Dejan fully respects robots.txt directives, including Disallow and Crawl-Delay rules. The bot checks the file at each visit and will stop crawling any disallowed paths immediately. This compliance is enforced by the crawler's code and is auditable via their open-source repository on GitHub (https://github.com/dejan/crawler).
🔍 Detection Indicators
The primary detection indicator is the User-Agent string DejanBot/1.0 or Dejan-Collector/2.0, which includes a link to their official bot page. Additionally, the bot sends a custom HTTP header X-Dejan-Client: true in every request, which can be used for server-side filtering. Behavioral fingerprints include a consistent request rate and the use of a specific Accept-Language header of en-US,en;q=0.9.
📊 Data Usage
Collected data is primarily used to train and fine-tune Dejan's large language models, including the DejanMind series, as well as to enhance their search and recommendation systems. Per their privacy policy at https://dejan.com/privacy, the company may also use aggregated, anonymized data for product analytics and benchmarking against other AI models.
⚙️ Rate Limiting Policy
While Dejan is a legitimate crawler, its occasional burst behavior (up to 10 requests per second when crawling multiple pages concurrently) can strain smaller servers. Therefore, rate limiting is recommended with a threshold of 50 requests per 10 seconds per IP, after which requests are temporarily blocked for 60 seconds. This policy balances the need for data collection with server resource protection.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.