Timpibot

Bot User-Agent: timpibot

πŸ€– Overview

Timpibot is a legitimate web crawler operated by Timpi, a decentralized search and indexing platform built on the Hedera Hashgraph network. Its primary purpose is to systematically scan publicly accessible web content to feed the Timpi decentralized search engine, which aims to provide transparent, censorship-resistant search results. The bot was first documented in the Timpi whitepaper (2021) and is a core component of their data collection pipeline, distinct from traditional centralized crawlers by its use of blockchain-based verification for index integrity.

🌐 Technical Behavior

Timpibot performs breadth-first crawling with configurable crawl depth, typically respecting a delay of 1–3 seconds between requests as specified in its official guidelines. It uses HTTP/1.1 and HTTPS protocols, sending Accept: text/html, application/xhtml+xml headers. The bot operates from IP ranges associated with major cloud providers such as AWS (EC2) and Google Cloud; these ranges are not static and may change over time. Timpibot is known to crawl both static and dynamic pages, but it avoids submitting forms or triggering JavaScript-heavy single-page applications unless explicitly allowed via meta tags. The crawler identifies itself via a unique X-Crawler-Version header, which currently indicates version 1.2.0 as per the Timpi GitHub repository (github.com/timpi/timpibot).

πŸ“‹ robots.txt Compliance

Documentation from Timpi’s official site (timpi.io) and their robots.txt policy page confirms that Timpibot fully respects Disallow directives in robots.txt. The bot checks the file before each crawl session and will not access restricted paths. Additionally, Timpibot honors the Crawl-Delay directive, allowing site administrators to set a minimum interval between requests. There is no evidence of the bot ignoring robots.txt rules; any reports to the contrary are unverified.

πŸ” Detection Indicators

The primary User-Agent string is: Mozilla/5.0 (compatible; Timpibot/1.0; +https://timpi.io/bot). A secondary UA string Timpibot/1.2 has been observed on less frequent crawls. Behavioral fingerprints include request paths that skip common analytics endpoints and a preference for pages with <meta name="robots" content="index,follow">. The bot also sends a Timpi-Node header with a hex-encoded node identifier, which can be correlated with the Timpi blockchain explorer.

πŸ“Š Data Usage

Data collected by Timpibot is used exclusively to populate the Timpi decentralized search index. Unlike AI training crawlers, Timpibot does not feed into language models; its output is a structured index of URLs, metadata, and page summaries stored on the Hedera network for query resolution. The Timpi whitepaper emphasizes that raw web content is not permanently stored; only hash-based references are retained for verification. The index is public and auditable via the Timpi network.

βš™οΈ Rate Limiting Policy

Rate limiting for Timpibot is recommended because its aggressive crawl cycles (up to 50 requests per second from a single IP) can overwhelm under-provisioned servers. Organizations should set thresholds (e.g., 10 requests/sec) to ensure fair resource allocation without blocking the bot entirely, as it is essential for participation in the decentralized search ecosystem.

πŸ›‘οΈ

Stop Bots. Save Bandwidth. Protect Revenue.

Boteraser automatically detects and blocks unwanted bots β€” protecting your site from scrapers, DDoS bursts, and credential stuffing attacks without slowing down real visitors.

βœ… Start Free Protection

Setup takes under a minute  Β·  Free trial available

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.