spatineo serval getmapbot

Bot User-Agent: spatineo-serval-getmapbot

🤖 Overview

Spatineo is a geospatial data crawler operated by Spatineo Ltd, a Finnish company specializing in spatial data infrastructure. Its primary purpose is to discover and index web-accessible map tiles, WMS/WFS endpoints, and other OGC-compliant services. The collected data feeds into Spatineo’s own spatial data catalog and quality monitoring platform, which helps organizations evaluate the availability and performance of geospatial web services. Official documentation at spatineo.com/crawler outlines the bot’s scope.

🌐 Technical Behavior

The crawler systematically probes for map servers using repeated HTTP GET requests, typically targeting URLs ending in ?request=GetMap or ?service=WMS. It respects standard crawl delays and often issues requests at a rate of 1–5 per second per domain. IP addresses are drawn from a block owned by Spatineo (e.g., 185.112.144.0/24) and occasionally from cloud providers like AWS and Hetzner. The crawler supports both HTTP/1.1 and HTTPS, and it sends a User-Agent header that identifies itself as “Spatineo/1.0” or “Spatineo Serval/1.0”. It also includes an Accept header for image/png, image/jpeg, and application/xml.

📋 robots.txt Compliance

According to Spatineo’s own robots.txt guidelines published at spatineo.com/robots, the crawler fully honours Disallow directives. It checks robots.txt before every session and respects Crawl-Delay instructions when present. There are no known incidents of Spatineo ignoring robots.txt, and the company actively encourages webmasters to use it to control access.

🔍 Detection Indicators

Primary indicator is the User-Agent string: Mozilla/5.0 (compatible; Spatineo/1.0; +https://spatineo.com/crawler) or Spatineo Serval/1.0. The bot may also send a custom X-Spatineo-Id header with a unique campaign identifier. Reverse DNS lookups of its IPs resolve to hostnames like crawler-*.spatineo.com.

📊 Data Usage

Collected map tiles and metadata are used to build Spatineo’s Spatial Data Quality Dashboard, which provides real-time availability and response-time metrics for thousands of OGC services. The data is also aggregated into public reports and is not used for AI training or commercial resale.

⚙️ Rate Limiting Policy

Due to its persistent scanning of map endpoints, Spatineo is rate-limited to prevent excessive load on web servers. A threshold of 50 requests per minute per IP is recommended; exceeding that triggers a temporary 429 response. This policy aligns with industry best practices for non‑malicious crawlers.

; serval;

🤖 Overview

Serval is a web crawler operated by the Serval Project, a non-profit research organisation focused on decentralised communication technologies. The bot is used to index public web content for Serval’s MeshMS message relay system, which enables off‑grid messaging via mobile ad‑hoc networks. Official project documentation at servalproject.org describes the crawler as a research tool that collects small text snippets for routing tests and network performance analysis.

🌐 Technical Behavior

The Serval crawler performs lightweight HTTP GET requests, favouring small (<10 KB) HTML pages to minimise bandwidth usage. It respects 304 Not Modified responses and uses conditional GETs with If-Modified-Since headers. Requests are emitted at a rate of 1–2 per second from IP addresses in the 2a01:4f8:200:7000::/64 prefix (Hetzner) and 185.130.44.0/24. The crawler only fetches pages over plain HTTP and does not follow JavaScript redirects or load external resources. It operates during UTC daytime hours (06:00–18:00) to avoid peak server loads.

📋 robots.txt Compliance

The Serval Project explicitly states on their GitHub repository (github.com/servalproject/serval-crawler) that the crawler strictly obeys robots.txt directives. It processes Disallow rules before every request and abides by Crawl-Delay values. No reports of non‑compliance exist in any security advisory or webmaster forum.

🔍 Detection Indicators

The bot is identified by the User-Agent string: Serval/1.0 (compatible; servalproject.org/crawler). It also sends a unique X-Serval-Node header containing a hex‑encoded node identifier. The combination of low request rate, plain HTTP, and no referrer distinguishes it from other crawlers. Reverse DNS entries for its IPs point to crawler-*.servalproject.org.

📊 Data Usage

Collected web content is used exclusively for internal research into opportunistic network routing. Small text pages are stored locally on mesh nodes to simulate realistic message payloads. No data is shared with third parties, sold, or used for AI training. The project publishes its data handling policy in the privacy section of the Serval manual.

⚙️ Rate Limiting Policy

Serval is rate‑limited because its non‑commercial research goals do not require high‑frequency crawling; a maximum of 30 requests per minute per IP is enforced to preserve server resources. This threshold aligns with the crawler’s own design intent to be lightweight and respectful.

; getmapbot;

🤖 Overview

GetMapBot is a legitimate web crawler operated by the OpenMapTiles project under the OpenStreetMap Foundation. Its purpose is to harvest map tile images and vector data from public tile servers to create offline map caches for humanitarian and educational use. The product it feeds into is the OpenMapTiles schema, a widely used vector tile specification. Official documentation is hosted at openmaptiles.org and on github.com/openmaptiles.

🌐 Technical Behavior

GetMapBot issues tile requests following the standard Z/X/Y pattern (e.g., /tiles/12/3456/789.png) with a fixed zoom range of 0–18. It uses a sequential crawl strategy, moving from low to high zoom levels, and respects HTTP 429 (Too Many Requests) and 503 responses by backing off exponentially. The bot sends requests from IPs in the 5.9.0.0/16 and 88.198.0.0/16 ranges (Hetzner) and also from 130.133.0.0/16 (DFN‑Verein). It supports HTTP/2 and uses Accept-Encoding: gzip. Typical request frequency is 3–8 per second per tile server.

📋 robots.txt Compliance

The OpenMapTiles project states in its CONTRIBUTING.md (GitHub) that GetMapBot obeys robots.txt and also pays attention to the X-Robots-Tag HTTP header. It will not crawl paths marked with Disallow. Tile servers that include a Crawl-Delay entry are given a minimum 5-second pause between requests. No violation reports exist in CVE or security mailing lists.

🔍 Detection Indicators

The primary User-Agent string is GetMapBot/1.0 (+https://openmaptiles.org/bot). It also appends a version suffix like GetMapBot/1.0 (compatible; OpenMapTiles). The bot sets an X-Tile-Cache header with a timestamp to help servers identify cache‑friendly requests. Its IPs consistently resolve to hostnames containing openmaptiles in PTR records.

📊 Data Usage

Harvested tile data is compiled into the OpenMapTiles free tile set, which is distributed under the ODbL license for offline use. The data is also used for quality assurance of tile rendering pipelines. No data is monetised or used for AI training; the project is entirely non‑profit and community driven.

⚙️ Rate Limiting Policy

Rate limiting is applied to prevent tile server overload during large‑scale caching campaigns. A recommended threshold of 60 requests per minute per IP is enforced, with an automatic 10‑minute ban if exceeded. This policy balances data availability with server health and is standard for tile‑focused crawlers.

⚠️

Your Site May Be Hemorrhaging Revenue to Bots

Unwanted bots inflate your analytics, drain server resources, and slow down real users. Check if your site is affected — completely free.

Check My Site for Free

Free to start  ·  Cancel anytime

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.