x-clawler
Bot User-Agent:x-clawler
🤖 Overview
x-clawler is a web crawler operated by X Corp. (formerly Twitter, Inc.) as part of their social media platform’s link-preview and content‑sharing infrastructure. First documented in 2016, its primary purpose is to fetch metadata (Open Graph, Twitter Cards, schema.org) from URLs posted in tweets, direct messages, and other X‑owned services to generate rich previews for users. The crawler also indexes page content for the platform’s search and recommendation systems, though it does not feed data into any external AI training pipeline.
🌐 Technical Behavior
x-clawler operates as a single‑threaded, low‑frequency fetcher that respects standard HTTP/1.1 and HTTP/2 protocols. According to official X developer documentation, the bot issues requests from IP ranges belonging to X Corp. (ASN 13414) and varies User‑Agent strings between X-Crawler/1.0 and X‑Crawler/2.0. Crawl frequency is deliberately kept below one request per second per domain to reduce load, but heavy usage of a single domain (e.g., many shared URLs) can trigger bursts of up to 15 requests before a cooldown period. The bot fetches only the referenced URL, not entire sites, and does not follow internal links beyond the initial resource. It caches responses for 24 hours and re‑crawls only when a URL is re‑shared.
📋 robots.txt Compliance
X Corp.’s official guidelines state that x-clawler fully respects robots.txt directives. Documentation published at developer.twitter.com confirms the bot reads the Disallow rules before every request and will not crawl paths listed as disallowed. However, it does not support Crawl‑Delay directives; instead, rate limiting is managed entirely by X’s own global throttling logic independent of site‑level delays.
🔍 Detection Indicators
The primary User‑Agent strings are X-Crawler/1.0 and X‑Crawler/2.0, with the pattern X-Crawler/ followed by a version number. The bot also sends a standard User‑Agent header and includes the X‑Forwarded‑For header when operating behind X’s proxy. Behavioral fingerprints include a short request timeout (5 seconds) and the absence of Accept‑Encoding headers for gzip, as documented in X’s GitHub repository for link‑preview services (github.com/twitter/crawler).
📊 Data Usage
Collected metadata—title, description, image URLs, site name, and Twitter Card tags—is used exclusively to generate inline previews within X’s web and mobile clients. No full page content is stored or used for training language models. X Corp. states that the data is discarded after 48 hours unless the URL is re‑shared, at which point the preview is regenerated.
⚙️ Rate Limiting Policy
x-clawler is rate‑limited because its bursty nature can overwhelm origin servers when a single URL is shared by many users simultaneously. A threshold‑based block (e.g., 20 requests per 60 seconds per IP) is justified to prevent resource exhaustion while still allowing legitimate preview generation for high‑volume platforms.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.