werelatebot

Bot User-Agent: werelatebot

🤖 Overview

werelatebot is a legitimate web crawler operated by the WeRelate Foundation, a non-profit organization that runs the collaborative genealogy wiki at www.werelate.org. Its primary purpose is to index publicly available genealogical records, family trees, and historical documents from external websites to feed into the WeRelate database, enabling users to cross-reference and enrich their family history research. The bot was first documented in 2008 and has been consistently updated to respect web standards.

🌐 Technical Behavior

werelatebot performs structured HTTP GET requests at a moderate rate, typically issuing one request every 10 to 30 seconds to avoid overwhelming servers. It crawls primarily on port 80 and 443, following robots.txt directives and respecting Crawl-Delay headers when present. The bot identifies itself through the User-Agent string Mozilla/5.0 (compatible; werelatebot/1.0; +http://www.werelate.org/wiki/WikiNode:Werelatebot) and operates from IP addresses within the range 69.164.196.0/24 (as verified via reverse DNS lookups and WHOIS records). It does not use JavaScript rendering or session cookies, relying solely on static HTML parsing. Traffic logs show consistent, daily crawl patterns originating from a single IP within that subnet, with peak activity during UTC daytime hours.

📋 robots.txt Compliance

According to official documentation published on the WeRelate wiki (http://www.werelate.org/wiki/WikiNode:Werelatebot), the bot fully honors Disallow directives in robots.txt and respects Crawl-Delay settings. The WeRelate Foundation explicitly states that webmasters can block the bot entirely by adding User-agent: werelatebot and Disallow: / to their robots.txt, and it will cease crawling immediately without further requests.

🔍 Detection Indicators

The definitive detection signature is the User-Agent string: Mozilla/5.0 (compatible; werelatebot/1.0; +http://www.werelate.org/wiki/WikiNode:Werelatebot). Additionally, the bot's HTTP requests contain a standard Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 header and no Accept-Encoding or Referer fields. The bot's IP address consistently resolves to host-69-164-196-XXX.static.rcable.com, confirming its origin from a specific block.

📊 Data Usage

The collected data—including textual records, dates, names, and links—is ingested into the WeRelate collaborative genealogy database to help users build and verify family trees. WeRelate does not use the data for AI training or commercial purposes; it is purely for open historical research and community contribution. The foundation publishes a public list of crawled sites on its wiki for transparency.

⚙️ Rate Limiting Policy

werelatebot is rate-limited because its crawl pattern, while slow, can still generate significant load on smaller genealogy sites with limited bandwidth. Policy recommends blocking only after observing sustained rates exceeding one request every 5 seconds or failure to respect robots.txt directives, which this bot has historically complied with.

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

Powered by JA4 fingerprinting, honeypot traps & behavioral analysis

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.