digext

Bot User-Agent: digext

🤖 Overview

digext is a web crawler operated by DigiExt Technologies Inc., a data analytics firm based in San Francisco, first announced in a 2021 blog post on digi-ext.com. The bot collects publicly accessible web content to feed the DigiExt Insights platform, which provides subscribers with competitive intelligence, SEO analytics, and AI-driven content categorization.

🌐 Technical Behavior

digext uses a distributed crawling infrastructure hosted on AWS EC2 and Google Cloud Platform, with IP addresses primarily from ranges 3.0.0.0/16 and 35.0.0.0/16. It sends requests at a default rate of 5 per second per IP and respects a 60-second Crawl-Delay directive in robots.txt. The bot supports HTTP/1.1 and HTTP/2, includes a custom header X-DigiExt-Request: v2.1, and skips binary file types such as .exe and .zip unless explicitly allowed. It follows canonical links, sitemap.xml, and re-crawls pages every 7–30 days based on content change frequency.

📋 robots.txt Compliance

Per DigiExt’s official documentation at https://docs.digi-ext.com/bot-policy, digext honors Disallow directives and Crawl-Delay values. It also respects nofollow meta tags and rel="nofollow" attributes. Independent audits (Botguard Research, 2022) confirm no known violations, though following redirects can bypass restrictions on a different host’s robots.txt.

🔍 Detection Indicators

The primary User-Agent string is digext/1.0 (compatible; DigiExt Bot; +https://digi-ext.com/bot). Secondary strings include digext/2.0 (DXT Crawler). Behavioral fingerprints include a 2-second inter-request gap, acceptance of gzip/deflate, and low variation in Accept headers. Log entries typically show 2–4 IPs per session with no referrer and a consistent X-Bot-Type: DigiExt header in some deployments.

📊 Data Usage

Collected data is used to train DigiExt’s proprietary content categorization models and generate market share reports, backlink dashboards, and competitive monitoring tools for clients. The company states in its privacy policy that it does not store PII and anonymizes crawled content by stripping names and emails before processing.

⚙️ Rate Limiting Policy

digext is rate-limited because its systematic, high-volume crawling can consume significant server resources without coordination. A threshold-based block (e.g., >20 requests in 10 seconds per IP) is recommended to maintain site performance while allowing legitimate indexing for analytical purposes.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.