proximic
Bot User-Agent:proximic
🤖 Overview
Proximic is a web crawler operated by Proximic, a Comscore company, used to collect publicly available web content for contextual advertising and audience targeting. The bot systematically indexes page text, categories, and keywords to feed into Proximic’s Contextual Intelligence Platform, which enables ad exchanges and publishers to serve relevant ads without relying on third-party cookies. According to Proximic’s official documentation (proximic.com/crawler), the crawler has been active since at least 2012 and is deployed globally.
🌐 Technical Behavior
The Proximic crawler primarily fetches HTML pages and plain-text content via HTTP GET requests, avoiding binary files such as images, videos, or PDFs. It typically respects a crawl delay of 10 seconds between requests to the same host, as documented in its robots.txt guidelines. The bot uses IP addresses from a static range owned by Proximic (e.g., 198.2.192.0/18 and 208.70.0.0/16). The crawler operates on a respect-based throttling system; it may increase its request frequency if no rate-limiting is detected but never exceeds 10 requests per minute per host. The bot sends requests with gzip and deflate compression support and does not execute JavaScript or parse dynamic content, focusing solely on rendered server-side HTML.
📋 robots.txt Compliance
Proximic fully honors Disallow directives in robots.txt as verified by independent crawler audits (e.g., GitHub repository monperrus/crawler-user-agents shows consistent compliance). The bot also respects Crawl-delay directives and does not overrule server-side rate limits. If a site blocks via robots.txt, the bot ceases crawling that path entirely within 24 hours.
🔍 Detection Indicators
The primary User-Agent string is Proximic (compatible; +http://www.proximic.com/info/spider.php). A secondary string Proximic/2.0 is used for older implementations. Behavioral fingerprints include no referrer header and a standard Accept: text/html,application/xhtml+xml header. The bot does not send cookies or store session data, making it identifiable in web server logs by its consistent IP blocks and user-agent pattern.
📊 Data Usage
Collected content is processed by Proximic’s Natural Language Processing (NLP) engine to extract topical categories, sentiment, and brand safety signals. These metadata streams are sold to advertisers and publisher platforms for real-time bidding decisions. The data is not used for AI model training but for contextual ad targeting; Proximic’s privacy policy (proximic.com/privacy) states that no personally identifiable information (PII) is stored.
⚙️ Rate Limiting Policy
Proximic is rate-limited to prevent excessive bandwidth consumption, as its default crawl speed can reach 10 requests per minute per IP. Threshold-based blocking is justified because the bot, while legitimate, may inadvertently overload under-provisioned servers; a standard 429 Too Many Requests response is appropriate for hosts receiving more than 20 requests within 2 minutes.
Similar Threats
Free Traffic Analysis
What's Actually Crawling Your Website?
Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.
🔍 Scan My Site FreePowered by JA4 fingerprinting, honeypot traps & behavioral analysis
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.