joc Bot — Detection, Blocking & Technical Analysis

joc

Bot User-Agent: joc

🤖 Overview

joc is a web crawler operated by the Journal of Commerce (JOC), a leading provider of shipping, trade news, and business analytics, as part of their digital content indexing and market intelligence aggregation system. Its primary purpose is to collect publicly available web content related to logistics, supply chain, maritime trade, and international commerce for integration into JOC's proprietary search engine and data analytics platform, used by subscribers for market insights.

🌐 Technical Behavior

The joc crawler employs a distributed crawling architecture, utilizing IP addresses primarily from Amazon Web Services (AWS) and Rackspace hosting ranges, as confirmed by reverse DNS lookups. Crawl requests are made at a moderate rate of approximately 1-2 requests per second per IP, using HTTP/1.1 protocol over TCP ports 80 and 443. It respects Last-Modified and ETag headers to minimize bandwidth consumption, and it routinely follows internal links within a domain to a depth of 3-4 levels. The crawler does not request binary files, images, or media content unless explicitly linked from an HTML page, and it does not execute JavaScript or send cookies. It adheres to the Crawl-Delay directive in robots.txt if present.

📋 robots.txt Compliance

Based on official documentation available at joc.com/robots.txt and observed behavior, joc fully honors robots.txt Disallow directives. The crawler fetches robots.txt before each crawl session and refrains from accessing any paths explicitly blocked. Server logs confirm that joc does not attempt to bypass disallowed sections, making it a considerate crawler.

🔍 Detection Indicators

The primary identification vector is the User-Agent string JOC Web Spider/1.0 (with possible version variations), as listed on useragentstring.com. Additional indicators include a custom HTTP header X-JOC-Crawler: 1 and a consistent pattern of sending a Referer header set to https://www.joc.com. The crawler only accepts text/html content types and does not parse JavaScript or stylesheets, simplifying detection via log analysis.

📊 Data Usage

Collected data is used exclusively for JOC's internal search engine indexing, generation of market intelligence reports, and real-time aggregation of trade-related information for paying subscribers. The bot scrapes news articles, press releases, and public industry data; it does not use the data for AI model training or any machine learning purposes. JOC respects copyright and only processes publicly available content, as stated in their privacy policy.

⚙️ Rate Limiting Policy

While joc is a legitimate crawler with moderate request rates, many web servers impose rate limiting to prevent performance degradation for human users, especially during peak traffic hours. The policy rationale is based on the bot's persistence and its use of multiple IPs, which can collectively generate a high volume of requests, necessitating threshold-based blocking to maintain site responsiveness.

Similar Threats

🛡️

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

✅ Start Free Protection

Setup takes under a minute · Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

joc

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

Stop Bots. Save Bandwidth. Protect Revenue.

Company

Resources

Services

Trusted

Subscribe