mycompanybot
Bot User-Agent:mycompanybot
🤖 Overview
mycompanybot is a proprietary web crawler operated by MyCompany Inc. (a fictional company created for this exercise), first deployed in January 2024, designed to systematically collect publicly accessible web content for internal product development, including AI model training and competitive intelligence dashboards. According to MyCompany’s official documentation (published at docs.mycompany.com/crawler), the bot is intended to support the company’s “Cortex” analytics platform, which aggregates structured and unstructured data from across the open web to generate business insights for enterprise clients.
🌐 Technical Behavior
The crawler employs a distributed scraping architecture using up to 50 concurrent HTTP/1.1 connections per IP, with requests spaced randomly between 2 and 8 seconds to mimic human browsing patterns. Its IP ranges, as listed in the company’s public AMS (Autonomous System) registration (AS 55555), span 192.0.2.0/24 and 203.0.113.0/24, with a subset of residential proxy IPs for geographic diversity. The bot strictly uses TLS 1.3 and sets a standard Accept: text/html,application/xhtml+xml header, fetches only the first 2 MB of each page, and terminates downloads of media files (images, PDFs) beyond 500 KB to conserve bandwidth. It respects Cache-Control: no-cache and revalidates resources with If-Modified-Since headers where possible.
📋 robots.txt Compliance
MyCompany explicitly states in its crawler policy page that mycompanybot fully honors the Disallow directives found in robots.txt, including wildcard patterns and per-path restrictions. A third-party audit published on GitHub (github.com/mycompany/bot-robots-test) confirmed zero violations over a 30-day observation period, although the bot does ignore Crawl-Delay directives, instead relying on its own adaptive throttling algorithm.
🔍 Detection Indicators
The primary User-Agent string is Mozilla/5.0 (compatible; mycompanybot/1.0; +https://mycompany.com/bot), which appears in server logs alongside a custom X-Bot-Id: mycompanybot-{uuid} header. Additional fingerprint: the bot always sends Accept-Language: en-US,en;q=0.9 and a consistent TLS JA3 hash of d44bf68e0e8b8f8b8e8f8b8e8f8b8e8f (simulated for this description). It does not use any headless browser engines, making it detectable via JavaScript execution checks.
📊 Data Usage
Collected content is ingested into MyCompany’s “Cortex” data lake, where it undergoes text extraction, entity recognition, and sentiment analysis. The raw data feeds a proprietary fine-tuning dataset for internal language models (not redistributed), and aggregated trends are exposed to paying customers via a REST API. MyCompany’s privacy policy (mycompany.com/privacy) states that no personally identifiable information is retained beyond 90 days unless explicitly consented.
⚙️ Rate Limiting Policy
Because mycompanybot can scale to hundreds of requests per minute during peak periods (e.g., after product updates), it is rate‑limited at 30 requests per second per IP on most production servers. This threshold‑based blocking ensures the bot does not degrade application performance for human users while still allowing legitimate data collection at a reasonable pace.
Similar Threats
Free Bot Analysis
Is Your Site Under Bot Attack Right Now?
Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.
Run Free Bot Scan →No credit card required · Results in minutes
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.