goroam

Bot User-Agent: goroam

🤖 Overview

goroam is a web crawler operated by Roam Intelligence Inc., a data‑collection company focused on assembling large‑scale text corpora for training proprietary large language models (LLMs). According to the official documentation at roam.ai/crawler, the bot’s primary purpose is to gather publicly accessible web content that is later filtered, deduplicated, and fed into Roam’s AI training pipeline. The product that consumes this data is an unreleased foundation model codenamed “Roam‑0.” The bot was first observed in late 2023 and has since become a moderately common sight in server logs.

🌐 Technical Behavior

goroam performs crawls using a depth‑first traversal strategy and sends requests with a default delay of 5 seconds between consecutive hits to the same domain, as documented in its crawl policy. It supports HTTP/1.1 and HTTP/2, and requests are typically accompanied by an Accept‑Encoding: gzip header to reduce bandwidth consumption. The known IP ranges assigned to goroam originate from ASN 394057 (Roam Intelligence) and include the subnet 38.0.0.0/8 (formerly a legacy assignment) and 45.0.0.0/8 blocks, according to BGP records and published IP lists. The crawler respects the Crawl‑Delay directive in robots.txt and does not deliberately bypass rate limits. It often identifies itself via the User‑Agent: GoRoam/1.0 string but may also vary the version number (e.g., GoRoam/2.0) during updates. Referer headers are not sent, and the bot does not execute JavaScript or load external resources beyond HTML and linked stylesheets.

📋 robots.txt Compliance

Roam Intelligence’s official documentation explicitly states that goroam honors Disallow directives in robots.txt. The crawler also reads the Crawl‑Delay key and respects Allow overrides. Independent tests by the community (e.g., a 2024 analysis on GitHub) confirm that goroam does not attempt to access paths disallowed by robots.txt, and it adheres to the X‑Robots‑Tag HTTP header when present. However, like many AI crawlers, it does not respect per‑user agent blocks that target “Googlebot” or “GPTBot” unless its own User‑Agent is explicitly listed.

🔍 Detection Indicators

The primary User‑Agent string is GoRoam/1.0 (compatible; +https://roam.ai/crawler). Additional variants include GoRoam/2.0 and RoamCrawler/1.0. The bot also sends a custom header X‑Roam‑ID with a hexadecimal token (e.g., 0x3f7a) that can be used for verification via Roam’s public resolver. Reverse DNS lookups of goroam IPs resolve to hostnames ending in .crawl.roam.ai. The bot’s HTTP requests always include a Connection: close header and a short User‑Agent without Accept‑Language.

📊 Data Usage

Collected content is used exclusively for training Roam’s large language models, not for search indexing, analytics, or advertisement profiling. After crawling, raw HTML is parsed and stored in a proprietary cloud database, then later processed with NLP pipelines to extract high‑quality text passages. Roam Intelligence states that data is not sold or shared with third parties, and that the training dataset is anonymized to remove personal information prior to model training.

⚙️ Rate Limiting Policy

goroam is rate‑limited because its aggregate crawl volume can spike during model‑refresh cycles, potentially overwhelming smaller sites. A threshold‑based blocking policy is applied (e.g., 100 requests per minute per IP) to ensure fair resource allocation while still allowing the legitimate crawler to operate. This reflects the principle that aggressive but well‑behaved bots should be throttled rather than permanently blocked.

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

Sign up in seconds  ·  No card required

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.