metaspider

Crawler User-Agent: metaspider

🤖 Overview

MetaSpider is a web crawler operated by Meta Platforms, Inc. (formerly Facebook) for indexing public web content to power search features within Meta’s ecosystem, including Facebook Search and Instagram Search. First documented in Meta’s official crawler documentation in 2023, it is distinct from the older Facebot crawler and is designed to collect structured data like article metadata and Open Graph tags for improved content discovery and ranking across Meta’s social graph.

🌐 Technical Behavior

MetaSpider sends requests using the HTTP/1.1 and HTTP/2 protocols, primarily from IP ranges registered to Meta’s ASN (AS32934). The crawler operates with a moderate request frequency, typically sending 1–5 requests per second per host, with a configurable crawl delay that defaults to 1 second unless a Crawl-Delay directive is set in robots.txt. It respects the If-Modified-Since header to avoid re-downloading unchanged content, and it only fetches pages that contain Open Graph meta tags or relevant structured data for indexing. MetaSpider uses the Accept-Language: en-US,en;q=0.9 header and does not send any cookies by default. The crawler’s IP ranges are published in Meta’s public IP lookup tool at https://developers.facebook.com/docs/sharing/webmasters/crawler, and it frequently rotates its user-agent suffix to avoid pattern blocking.

📋 robots.txt Compliance

MetaSpider fully obeys robots.txt directives, including Disallow, Allow, and Crawl-Delay rules, as documented in Meta’s official webmaster guidelines available at https://developers.facebook.com/docs/sharing/webmasters. It will not access pages blocked by a Disallow directive, and it also respects noindex and nofollow meta tags. There are no verified incidents of MetaSpider violating robots.txt rules, and it is considered a well-behaved crawler within the search industry.

🔍 Detection Indicators

The primary User‑Agent string is MetaSpider/1.0 with variations like MetaSpider/2.0 and MetaSpider (+https://developers.facebook.com/docs/sharing/webmasters/crawler). Behavioral fingerprints include requests for /.well-known/social-media/ endpoints and a preference for text/html and application/rss+xml content types. The crawler always includes a Referer header set to https://www.facebook.com/ when following links from Facebook pages. A reverse DNS lookup on the source IP will resolve to a .fbsv.net or .facebook.com subdomain.

📊 Data Usage

Collected data—including page titles, descriptions, images, and Open Graph metadata—is used to build and update Facebook’s link indexing for previews in posts, shares, and stories. The data also feeds Meta’s Graph Search and content recommendation algorithms, enabling users to discover articles, videos, and products shared within the platform. No raw page content is stored; only structured metadata is extracted and indexed.

⚙️ Rate Limiting Policy

Despite its legitimacy, MetaSpider can be aggressive when crawling large sites with thousands of pages, causing server load spikes. Rate‑limiting is applied via thresholds such as IP‑based request caps (e.g., 10 requests per second) and X‑RateLimit‑Remaining headers to maintain fair resource usage without blocking the crawler entirely. This policy ensures search engines remain functional while protecting backend stability.

Free Bot Analysis

Is Your Site Under Bot Attack Right Now?

Find out exactly how much of your traffic is automated — and which bots are draining your bandwidth and skewing your analytics.

Run Free Bot Scan →

No credit card required  ·  Results in minutes

ⓘ Data Notice: The information presented above has been compiled from publicly available internet sources. Boteraser aggregates this data solely for informational purposes and does not independently classify, evaluate, or endorse any findings about the bots listed. The accuracy and completeness of this information is the sole responsibility of the original publishers. Boteraser and its operators accept no liability for any decisions made based on this data.