geonabot
Bot User-Agent:geonabot
🤖 Overview
Geonabot is an automated web crawler operated by Geona, an AI-powered search engine and knowledge platform first publicly documented in early 2024. According to Geona’s official documentation at geona.ai, the bot is designed to index public web content for use in training large language models (LLMs) and to feed Geona’s generative search product, which returns summarized answers with citations. Geonabot is explicitly identified as a non-malicious, rate-limited agent that respects website owners’ preferences.
🌐 Technical Behavior
Geonabot initiates HTTP/1.1 and HTTP/2 requests with a configurable crawl delay, defaulting to 5 seconds between successive requests to the same host, as stated in the Geona crawling guidelines. It uses a distributed architecture with IP addresses drawn from a published range allocated to Amazon Web Services (AWS) and Google Cloud Platform (GCP), primarily in us-east-1 and europe-west1 regions. The crawler sends an Accept-Encoding: gzip header and prefers HTML pages over non-text resources, though it may follow links to PDFs and images for metadata extraction. Geonabot employs a breadth-first traversal of internal links, respecting the nofollow attribute on hyperlinks. Official reports indicate it performs a maximum of 2 concurrent connections per domain to reduce server load.
📋 robots.txt Compliance
Geonabot fully honors Disallow and Crawl-delay directives in robots.txt, as verified by Geona’s published policy and community tests on GitHub (geona/crawler-policy). It also respects Allow overrides when explicitly stated. Administrators can block Geonabot entirely by adding User-agent: Geonabot followed by Disallow: / to their robots.txt file.
🔍 Detection Indicators
The primary User-Agent string is Geonabot/1.0 (e.g., Mozilla/5.0 (compatible; Geonabot/1.0; +https://geona.ai/bot)), with a fallback of Geonabot/2.0 reported in later updates. Additional behavioral fingerprints include a missing Referer header and a request to /robots.txt before crawling any page. The bot also sends a custom X-Geona-Crawl header set to true for verification purposes, documented in Geona’s developer portal.
📊 Data Usage
Data collected by Geonabot is used to train Geona’s proprietary large language models, including its answer generation engine, and to build an indexed knowledge graph for real-time search. The company’s privacy policy at geona.ai/privacy states that personally identifiable information (PII) is stripped during processing, and raw HTML is retained for no more than 30 days after ingestion for quality assurance.
⚙️ Rate Limiting Policy
Geonabot is rate-limited because its distributed crawling can unintentionally overwhelm small servers if misconfigured; thresholds such as 100 requests per minute per IP are recommended for blocking while still allowing legitimate access, in line with the bot’s own terms of service that advise administrators to apply fairness-based limits.
Similar Threats
🛡️
Stop Bots. Save Bandwidth. Protect Revenue.
Boteraser automatically detects and blocks unwanted bots — protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.
✅ Start Free ProtectionSetup takes under a minute · Free trial available
ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.