memo Bot — Detection, Blocking & Technical Analysis

memo

Bot User-Agent: memo

🤖 Overview

memo is a web crawler operated by Memo AI Inc., a startup focused on AI‑enhanced personal knowledge management. The bot indexes web pages that users manually add or link within their Memo workspaces, feeding data into the platform’s semantic search and AI summarization features. Its primary purpose is to enable retrieval‑augmented generation (RAG) for user‑curated content, not large‑scale public indexing.

🌐 Technical Behavior

memo performs HTTP/HTTPS GET requests at a moderate rate — typically one request every 2–5 seconds per IP address — and respects Cache‑Control and ETag headers to avoid redundant downloads. The bot originates from IP ranges assigned to Amazon Web Services (AWS) and Google Cloud Platform (GCP), as documented in the official Memo bot page (https://memo.ai/bot). It follows up to three redirect hops and does not crawl non‑HTML resources (e.g., images, PDFs) by default. Requests carry a standard Accept‑Encoding: gzip header and operate over TLS 1.2 or later.

📋 robots.txt Compliance

According to Memo AI’s official documentation (https://memo.ai/bot), the memo crawler fully honors Disallow directives in robots.txt. It also checks for X‑Robots‑Tag and meta robots directives on a per‑page basis. No known incidents of non‑compliance have been reported in public security advisories or community forums.

🔍 Detection Indicators

The primary User‑Agent string is memo/1.0 (bot; +https://memo.ai/bot); a secondary variant Mozilla/5.0 (compatible; memo/1.0; +https://memo.ai) may appear. Behavioral fingerprints include a low request rate, no JavaScript execution, and a consistent request pattern of fetching HTML then pausing for several seconds. The bot does not set any custom HTTP headers beyond standard ones.

📊 Data Usage

Collected web pages are processed into vector embeddings using OpenAI’s text‑embedding‑3‑small model (per Memo AI’s engineering blog, 2024) and stored in a private vector database per user account. The raw HTML is discarded after embedding; only the embeddings and metadata (URL, title, timestamp) are retained. This data powers the Memo platform’s natural‑language question‑answering and semantic retrieval features.

⚙️ Rate Limiting Policy

Rate‑limiting is recommended because memo can still generate significant load if many users in the same IP range add links simultaneously. A threshold of 10 requests per second per IP is suggested as a reasonable block point, balancing the bot’s legitimate function against server resource protection (Source: Memo AI rate‑limit guidance, https://memo.ai/rate‑limits).

Similar Threats

Free Traffic Analysis

What's Actually Crawling Your Website?

Discover which unwanted bots are being blocked on your site, how often they hit, and where they come from — real data from your own traffic, not guesswork.

🔍 Scan My Site Free

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.

memo

🤖 Overview

🌐 Technical Behavior

📋 robots.txt Compliance

🔍 Detection Indicators

📊 Data Usage

⚙️ Rate Limiting Policy

What's Actually Crawling Your Website?

Company

Resources

Services

Trusted

Subscribe