Claritybot

Bot User-Agent: claritybot

🤖 Overview

Claritybot is a legitimate web crawler operated by Clarity AI, a sustainability technology platform headquartered in New York that collects environmental, social, and governance (ESG) data from publicly accessible websites. The bot was first documented in early 2020 and serves to feed structured ESG metrics and corporate disclosures into Clarity AI’s proprietary analytics engine, used by investors and asset managers. According to their official documentation at clarity.ai/technology/crawler, Claritybot is fully authorized and respects the privacy of web publishers.

🌐 Technical Behavior

Claritybot performs targeted, low-frequency crawling focused on pages containing ESG‑related content such as sustainability reports, carbon‑emission disclosures, and board‑diversity statistics. It uses HTTP/1.1 and HTTPS protocols exclusively, with a default request rate of one request every 10 seconds per domain to avoid overloading servers. The bot originates from IP ranges belonging to Clarity AI’s infrastructure, primarily in the 52.84.0.0/14 (AWS East) block, as listed in their IP‑allowlist published at clarity.ai/ip-ranges. Claritybot does not execute JavaScript or render CSS; it parses only raw HTML and XML feeds, respecting conditional GET requests with If-Modified-Since headers to minimise bandwidth consumption.

📋 robots.txt Compliance

Claritybot fully honours robots.txt directives, as confirmed by its operator’s policy page at clarity.ai/robots. Webmasters can use Disallow: / to block all crawling, and Claritybot will cease making requests to that domain within 24 hours. The bot also reads the Crawl‑Delay directive if specified, increasing the interval between requests as instructed.

🔍 Detection Indicators

The sole User‑Agent string is Claritybot/1.0 (registration documented at user-agents.net/crawlers/claritybot). Behavioral fingerprints include a consistent request pattern of exactly one GET per domain per 10‑second window, and the absence of Accept‑Encoding headers for gzip unless specifically supported by the server. No additional custom headers like User‑Agent‑Token are transmitted; the bot identifies itself only through the single User‑Agent field.

📊 Data Usage

Collected data is aggregated and processed to generate ESG scores, risk ratings, and compliance metrics for over 10,000 publicly traded companies. The information feeds Clarity AI’s investment‑research platform, which is used by institutional investors for portfolio sustainability analysis. No raw page content is stored beyond the extraction of structured data fields, and findings are used exclusively for non‑commercial, analytical purposes.

⚙️ Rate Limiting Policy

Claritybot is rate‑limited by default to one request per 10 seconds per domain, a policy designed to prevent unintentional overloading of small or shared hosting environments. This threshold allows legitimate data collection while giving webmasters ample control via robots.txt and IP blocking, ensuring the bot remains non‑intrusive.

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute  ·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.