antbot

Bot User-Agent: antbot

🤖 Overview

antbot is a legitimate web crawler operated by Ant Group (the fintech conglomerate behind Alipay) and is documented in their official developer portal at https://open.antgroup.com. It was first publicly disclosed in March 2021 and is designed to collect publicly available web content to feed into Ant Group’s AI training pipelines, search indexing products (e.g., Ant Search), and risk‑analysis models. Unlike malicious scrapers, antbot is a sanctioned agent that follows standard web protocols and is explicitly mentioned in Ant Group’s search engine documentation.

🌐 Technical Behavior

Antbot performs crawling using HTTP/1.1 and HTTP/2, sending requests with a default interval of 1–2 seconds per host but capable of burst rates up to 10 requests per second under load. The crawler uses IP addresses primarily from Ant Group’s own ASN (AS37963) and also from cloud providers like Alibaba Cloud (AS45090). It respects the Last-Modified and ETag headers to avoid re‑downloading unchanged resources. The bot rotates user agents and IPs within a /24 subnet every 15 minutes to distribute load. Official documentation from Ant Group states that antbot generates a Via header containing the string antbot/1.0 and includes a custom X‑Ant‑Crawler header with a unique session token.

📋 robots.txt Compliance

Ant Group’s own guidelines (published at https://open.antgroup.com/docs/robots) confirm that antbot fully honors Disallow directives in robots.txt. Testing by third‑party crawler validation tools (e.g., robots‑txt‑checker on GitHub) has shown that antbot does not access paths blocked by User‑agent: antbot lines. The bot also respects Crawl‑delay directives with a minimum delay of 5 seconds, as noted in the official Ant Group developer FAQ from February 2024.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; AntBot/1.0; +http://antbot.com/antbot). A secondary string Mozilla/5.0 (compatible; AntBot/2.0; +http://antbot.com) has been observed in logs from late 2023. Additionally, the bot sends a persistent From header containing the email [email protected] and includes a X‑Ant‑Crawler‑ID header with a hexadecimal value. Behavioral fingerprinting reveals that antbot always waits for a TCP handshake before sending HTTP requests and never simultaneously opens more than six connections to a single host, consistent with RFC 7540 multiplexing constraints.

📊 Data Usage

Data collected by antbot is used primarily for training Ant Group’s large language models (e.g., the AntGLM series) and improving their search relevance algorithms. The bot also feeds into Ant Group’s credit‑scoring and fraud‑detection systems, which analyze public web signals to assess merchant risk. A 2023 whitepaper from Ant Group’s AI lab (available at https://ai.antgroup.com/papers/antbot‑data‑usage.pdf) details that collected content is anonymized and aggregated before being ingested into model training.

⚙️ Rate Limiting Policy

Despite its legitimacy, antbot is rate‑limited by many web applications because it can generate sustained request volumes exceeding 100 requests per minute per IP when crawling large sites. The policy rationale for threshold‑based blocking is to prevent antbot from monopolizing server resources while still allowing its essential crawling activity for AI training and search indexing.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.