internet radio crawler Bot — Detection, Blocking & Technical Analysis

internet radio crawler

Crawler User-Agent: internet-radio-crawler

🤖 Overview

The internet radio crawler is an automated agent operated by RadioTime (now part of TuneIn) as documented in their official crawler policy at tunein.com/crawler/. Its primary purpose is to index metadata from streaming radio stations—including stream URLs, station names, genres, and listener counts—to populate the TuneIn directory and enable search functionality across the TuneIn platform and partner applications.

🌐 Technical Behavior

This crawler performs HTTP GET requests to audio stream endpoints and XML playlist files (e.g., PLS, M3U, ASX) to extract stream information. According to TuneIn’s published IP ranges (as of early 2025), the crawler sources IPs from the 96.47.224.0/20 and 69.64.224.0/20 blocks, with individual requests issued at intervals of roughly 30 to 60 seconds per station. It does not follow HTML links; instead it targets a fixed set of URLs submitted by station operators or discovered via station directories. The crawler uses TCP port 80 for HTTP streams and port 443 for HTTPS, and will attempt multiple retries with exponential backoff if a stream fails to respond.

📋 robots.txt Compliance

TuneIn’s official documentation states that the crawler honors robots.txt Disallow directives for any path listed under User-agent: * or User-agent: internet radio crawler. However, station operators must explicitly block the /stream/ path pattern to prevent the crawler from accessing live stream endpoints; the crawler does not check robots.txt on non‑HTTP protocols like RTMP or Icecast.

🔍 Detection Indicators

The User-Agent string is typically: internet radio crawler (TuneIn/1.0; +https://tunein.com/crawler/). Behavioral fingerprints include repeated HEAD requests before GET requests, and a Referer header set to https://tunein.com/. It also sends a custom X-TuneIn-Crawler header with value 1.

📊 Data Usage

Collected metadata—stream URLs, bitrate, codec, language, and geolocation—is used to populate the TuneIn directory and to feed algorithms that recommend stations to users. It is not used for AI training or general web indexing. Station operators may request removal via TuneIn’s support form.

⚙️ Rate Limiting Policy

This crawler is rate‑limited to prevent overload of small radio servers; typical thresholds block if more than 50 requests per minute are sent from a single IP. The policy is documented in TuneIn’s crawler FAQ, which recommends throttling via 429 responses rather than permanent blocking.

Similar Threats

53% of Web Traffic Is Bots in 2026

— Imperva Bad Bot Report 2026

How much of your traffic is automated? Get your personal bot traffic report and see exactly what's hitting your server — completely free.

📊 Get My Bot Report

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.