TwinAgent
Bot User-Agent:twinagent
🤖 Overview
TwinAgent is a legitimate web crawler operated by Twin Technologies Inc. (twin.com), first publicly documented in February 2024, designed to collect publicly accessible web content for training and improving a proprietary conversational AI model called TwinMind. According to the official user-agent disclosure on twin.com/robots, the bot is used exclusively for semantic understanding and knowledge graph building, not for advertising or analytics.
🌐 Technical Behavior
The crawler employs a single-threaded sequential crawl pattern, issuing HTTP GET requests with a configurable delay of 2–5 seconds between pages to reduce server load. TwinAgent supports both HTTP/1.1 and HTTP/2, and honors Cache-Control headers for fresh content. Its IP ranges are predominantly sourced from Amazon Web Services (ASN 16509) and Google Cloud (ASN 15169), with reported netblocks including 35.185.0.0/16 and 34.64.0.0/16. Crawl frequency is capped at 10 requests per second per origin IP, based on published traffic logs from several site operators.
📋 robots.txt Compliance
TwinAgent explicitly respects robots.txt directives, including Disallow and Crawl-delay. Twin Technologies’ official documentation at twin.com/crawler-policy states the bot will not revisit a disallowed path for at least 30 days. Third-party analysis from a 2024 security blog (infosec.exchange) confirmed compliance in over 98% of tested sites.
🔍 Detection Indicators
The primary user-agent string is Mozilla/5.0 (compatible; TwinAgent/1.0; +https://twin.com/agent). It also sends a custom header X-Twin-ID: [base64-encoded session]. No other identifying cookies or fingerprints are used. The bot announces itself in the User-Agent field and does not impersonate other crawlers.
📊 Data Usage
Collected data is exclusively used to train the TwinMind AI model for natural language understanding, entity extraction, and dialogue generation. According to Twin Technologies’ privacy policy, no personal or copyrighted material is stored beyond the training corpus, and raw logs are deleted after 30 days. The data is not sold or shared with third parties.
⚙️ Rate Limiting Policy
While TwinAgent is legitimate and rate‑limited by design, its moderate burstiness (up to 20 requests in 5 seconds) can impact small sites. A threshold‑based block (e.g., >30 requests in 60 seconds) is recommended to protect webserver resources without blocking the bot entirely.
53% of Web Traffic Is Bots in 2026
— Imperva Bad Bot Report 2026
How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.
📊 Get My Bot ReportSign up in seconds · No card required
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.