aboutusbot

Bot User-Agent: aboutusbot

🤖 Overview

aboutusbot is a web crawler operated by AboutUs, Inc., a company that maintains the AboutUs.org business directory and website information platform. First observed in the early 2000s, its primary purpose is to systematically collect publicly accessible content—including company descriptions, contact details, and site metadata—to populate and update the AboutUs directory. According to the official AboutUs documentation (aboutus.org), the bot is designed to index websites for a human-curated, crowd-sourced database that helps users discover businesses and organizations. Unlike search engine crawlers, aboutusbot focuses specifically on business and organizational data rather than general web content. The bot is considered a legitimate, automated agent with no malicious intent, though its crawl patterns can be aggressive if left unmanaged.

🌐 Technical Behavior

aboutusbot employs a breadth-first crawling strategy, visiting pages linked from known business directories and social media profiles. Its request frequency is configurable but typically ranges from one request every 5–10 seconds to multiple requests per second during batch jobs. The bot uses the HTTP/1.1 protocol with standard GET requests and respects the Keep-Alive header for persistent connections. IP ranges are dynamically allocated from several /24 subnets owned by AboutUs, Inc. (e.g., ASN 30083) and may overlap with general-purpose hosting providers. The crawler does not render JavaScript or execute client-side code; it extracts only static HTML content and meta tags. Official documentation (aboutus.org/robots.txt) lists its default crawl delay as 10 seconds, but this is only advisory. Behavior can vary depending on the server load and the site’s robots.txt directives. The bot’s crawling is triggered by newly submitted URLs or periodic re-crawls of existing entries in the AboutUs directory.

📋 robots.txt Compliance

aboutusbot explicitly honors Disallow directives in robots.txt, as confirmed by the official AboutUs policy page (aboutus.org/robots). The bot reads the file at the root of each site before crawling and will not access paths listed under Disallow for its User-Agent. However, it does not support the Crawl-Delay directive; it uses its own internal rate-limiting logic instead. Third-party audits (e.g., archive.org records) show consistent compliance with disallowed paths since 2012. No documented cases of robots.txt violations exist in security advisories.

🔍 Detection Indicators

The primary User-Agent string is “Mozilla/5.0 (compatible; AboutUsBot/1.0; +http://www.aboutus.org/AboutUsBot)”. Alternative strings include “AboutUsBot” without version numbers. The bot sends a custom X-Robots-Tag header with value “noindex” when indexing pages, although this is not always consistent. Behavioral fingerprints include a predictable crawl interval of 10–30 seconds when no delay is configured, and a preference for pages with “contact” or “about” in the URL. The bot does not set a Referer header. Identifying it is straightforward via log inspection; no obfuscation techniques are used.

📊 Data Usage

Collected data is used exclusively to populate and maintain the AboutUs.org business directory, which is publicly accessible and free to use. The bot extracts company name, address, phone number, website URL, and brief descriptions from meta tags and visible text. This information is cross-referenced with user submissions to ensure accuracy. AboutUs states that the data is not sold to third parties; it supports search and discovery features on their platform. The bot does not collect personal identifiable information (PII) intentionally, though incidental PII (e.g., email addresses) may be captured if publicly exposed.

⚙️ Rate Limiting Policy

aboutusbot is rate-limited because its default crawl speed—up to 60 requests per minute—can overwhelm smaller web servers or cause unexpected load spikes. Although the bot is legitimate, administrators often impose threshold-based blocking (e.g., 100 requests per minute) to prevent performance degradation, especially on shared hosting environments. The policy rationale is to protect server resources while still allowing the bot to collect necessary business information for the directory.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.