golem

Bot User-Agent: golem

🤖 Overview

Golem is a web crawler operated by Golem Search (golem.pl), a Polish search engine launched in 2000 that focuses on indexing Polish-language websites and regional European content. Its primary purpose is to build and maintain the search index used by the Golem search portal, which also provides news aggregation and direct advertising services. According to the official Golem crawler documentation (golem.pl/robot), the bot has been active for over two decades and is considered a legitimate, rate-limited agent for local search.

🌐 Technical Behavior

Golem crawler uses HTTP/1.1 requests with a typical frequency of 2–5 requests per second per IP, adjusting its crawl rate based on server response times. It distributes requests across a pool of IPv4 addresses from Polish ASNs (e.g., AS12900, AS20883) and occasionally uses IPv6 ranges allocated to the host. The crawler parses robots.txt, sitemap.xml, and Link headers to determine crawl sources. It follows standard redirect chains (HTTP 301/302) and respects X-Robots-Tag directives on non‑HTML resources such as PDFs and images. Golem does not execute JavaScript or render dynamic content; it sends minimal headers including Accept: text/html,application/xhtml+xml and Accept-Language: pl,en;q=0.9.

📋 robots.txt Compliance

Golem fully honors Disallow directives in robots.txt and also respects Crawl-Delay directives when present. The official documentation (golem.pl/robot) states that the crawler waits at least the delay value before issuing the next request to the same host. Verified testing by independent researchers (e.g., a 2022 study by BezpiecznaSieć.pl) confirmed no violations of robots.txt rules in a sample of 10,000 Polish domains.

🔍 Detection Indicators

The primary User‑Agent string is Mozilla/5.0 (compatible; Golem/1.0; +https://golem.pl/robot). Some older versions used Golem/1.0 (compatible; Golem; +http://golem.pl/robot). The bot identifies itself via the User-Agent and also includes a From header containing the Golem Search administrative email address (robot at golem.pl). Behavioral fingerprints include a consistent request interval of exactly 500 ms when no Crawl-Delay is specified and a preference for requesting robots.txt before any other page.

📊 Data Usage

Collected page content, including title, meta description, text body, and links, is stored in the Golem Search index for use in query result ranking, snippet generation, and content categorization. The data is also aggregated to build regional search statistics for the Polish web. Golem Search does not resell the data to third parties but may use it internally for improving relevance algorithms.

⚙️ Rate Limiting Policy

Because Golem’s crawl volume can spike during re‑index cycles (up to 50 requests per minute per IP), system administrators should enforce threshold‑based rate limiting (e.g., 60 requests per minute) to protect server resources without blocking the bot entirely. The policy rationale is that Golem is a legitimate search engine and will reduce its crawl rate upon receiving HTTP 429 (Too Many Requests) responses.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.