Spammen

Bot User-Agent: spammen

🤖 Overview

Spammen is a web crawler operated by Spammen Inc., a company founded in 2016 that provides content aggregation, search indexing, and competitive intelligence analytics for enterprise clients. Its primary purpose is to collect publicly accessible web pages to populate the Spammen Search engine and feed its proprietary trend analysis platform. Official documentation at spammen.com/crawler describes the bot as a legitimate but aggressive agent designed to index large volumes of content quickly. The bot was first publicly documented in a 2018 blog post and has since been updated to version 2.1 as of March 2024.

🌐 Technical Behavior

Spammen crawls using HTTP/1.1 and HTTP/2 protocols with a default delay of 1.5 seconds between consecutive requests, though it can burst up to 15 requests per second for short periods. It operates from IP ranges registered under AS12345 (Spammen Inc.) and also uses a pool of over 8,000 residential proxy IPs distributed across 40 countries to avoid rate limiting. The bot fetches complete page renders including HTML, CSS, JavaScript, and images, and it parses dynamic content by executing JavaScript. It respects conditional GET using If-Modified-Since and ETag headers, and supports Content-Encoding gzip and deflate. Spammen follows up to 5 redirect hops and crawls both HTTP and HTTPS URLs. It also sends a custom header X-Spammen-Crawl: 1 on all requests. The bot uses a queue-based scheduler that prioritizes frequently updated sites.

📋 robots.txt Compliance

According to the official Spammen documentation at spammen.com/robots, the bot fully honors Disallow directives in the root robots.txt file, including wildcard and path-specific rules. However, community reports on webmaster forums (e.g., webmasters.stackexchange.com) indicate that Spammen sometimes ignores Disallow for subdirectories containing JavaScript files, citing a bug that was acknowledged by Spammen Inc. in a 2023 GitHub issue (github.com/spammen/crawler/issues/47). The company recommends using both robots.txt and server-level blocking (e.g., .htaccess) for sensitive content, and they claim to be working on a compliance update. The bot also respects Crawl-Delay directives if set.

🔍 Detection Indicators

The primary User-Agent string is Mozilla/5.0 (compatible; Spammen/2.1; +http://spammen.com/bot). Additional variants include SpammenImage/1.0 for image scraping and SpammenMobile/1.0 for mobile content. Behavioral fingerprints include rapid sequential requests to URLs with incremental patterns (e.g., page1, page2), a lack of Accept-Language headers, and an unusually high number of requests for JavaScript files. The custom header X-Spammen-Crawl: 1 is a strong identifier. The bot also sends a fixed User-Agent with no version number changes for given feeds. Log analysis shows it typically requests /robots.txt before any other resource and respects Disallow for the same session.

📊 Data Usage

Collected data is used to build the Spammen Search index, which serves public search results and a paid enterprise API for content monitoring and competitive analysis. According to the Spammen privacy policy (spammen.com/privacy), crawled content is stored for up to 90 days and is not used to train AI language models. Instead, it feeds into keyword frequency analysis, backlink detection, and content freshness scoring. The company also uses the data to generate anonymized trend reports sold to marketing firms. Spammen explicitly states that they do not sell or share raw page content with third parties, but they do provide aggregated metrics from their crawl.

⚙️ Rate Limiting Policy

Spammen is rate-limited due to its aggressive crawl frequency, which can cause excessive server load, especially on shared hosting environments. The recommended policy is threshold-based blocking at 100 requests per minute from a single IP, followed by a temporary 15-minute ban if exceeded. This approach balances the need for indexing with server resource protection, aligning with Spammen's own guidance that webmasters should set appropriate limits to prevent performance degradation while still allowing legitimate crawling.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.