Skip to main content
Blog

Facebook page post scraper for JSON post data

Use the facebook page post scraper to collect Page posts, reactions, shares, comments, attachments, and profile data as JSON.

ScrapersFacebookMay 12, 2026
Facebook page post scraper for JSON post data

Pull Facebook Page posts as JSON without maintaining selectors, browser sessions, or proxy routing.

This scraper returns one JSON row per Facebook Page post. Each row includes the post URL, post ID, author Page, publish date, comments, shares, reactions, attachments, and optional Page profile fields.

Use the Extract Facebook page posts scraper when you already have Page URLs and need post-level records for monitoring, reporting, enrichment, or historical backfills. The product page has the live schema, pricing, and API entry point.

If your Facebook input is an event, Marketplace listing, or keyword search, Browse all 86+ scrapers to find the right extractor.

How to use this scraper

Facebook Page Post Scraper job pipeline
The Facebook Page Post Scraper job pipeline, from input to stored output.

Send a Facebook Page URL, an optional date range, the maximum number of posts, and the profile-data flag.

The scraper returns one row per post. In ScrapeNow pricing, 1 returned row costs 1 credit. Credits start at $0.04 for 1 to 250 rows and drop to $0.012 at 100K+ rows.

Use a small num_of_posts value for test runs. Increase it after validation, deduplication, and storage work end to end.

For a first run, use one Page URL and request one post. That gives you enough data to test the full path without loading duplicate records into your warehouse.

Keep the first run small even when the Page has years of history. A single row tests authentication, job polling, result download, JSON parsing, and destination writes.

Inputs

Input Required Format Example
url Yes Must start with https://www.facebook.com https://www.facebook.com/Beyonce/
num_of_posts No String or number "1"
start_date No YYYY-MM-DD "2026-05-05"
end_date No YYYY-MM-DD "2026-05-13"
include_profile_data Yes String boolean for API usage "true"

Step 1, open Facebook

Open facebook.com.

Use a desktop browser for URL collection. Mobile URLs add redirects and path variants that create avoidable cleanup work.

Facebook search bar suggesting Coldplay page autocomplete options in feed
Type the Page name in the Facebook search bar to find the target Page

Step 2, search for the Page

Type the Page keyword in the search bar. For example, search for Coldplay.

Use the official Page when Facebook shows multiple results. Fan pages, groups, and duplicate Pages return different post histories.

Check the verification badge, follower count, and Page category before copying the URL. Those checks prevent obvious input mistakes before you spend credits.

Facebook Coldplay search results highlighting official verified musician page
Select the verified Page from search results and confirm the follower count

Step 3, open the Page and copy the URL

Click the target profile or Page on the results page.

Copy the URL from the browser address bar. The value must start with https://www.facebook.com.

Remove tracking parameters before sending the URL. A short Page URL is easier to log, compare, and debug.

Keep the canonical Page URL in your input table. Avoid mixing mobile URLs, tracking links, and redirected URLs in the same batch.

Use one canonical URL format across your whole input list. That makes deduplication easier when multiple teams collect Page URLs.

Facebook Coldplay Moon Music page profile with URL in address bar
Copy the Page URL from the address bar to use as scraper input

Step 4, set the date range and post count

Use start_date and end_date to limit the crawl window. For API usage, both dates must use YYYY-MM-DD.

Set num_of_posts to cap the returned post count. Set include_profile_data to "true" when you need page-level fields.

Page-level fields include followers, category, logo, external website, and verification status. Turn the flag off when your pipeline only needs post rows.

For recurring jobs, keep your date windows small. Daily or weekly windows reduce duplicate rows and make retries easier to isolate.

For historical backfills, split large ranges by week or month. Smaller windows give you cleaner retry boundaries when one Page fails.

Step 5, run the API request

Use this Python script.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "facebook-page-posts-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.facebook.com/Beyonce/",
        "end_date": "2026-05-13",
        "start_date": "2026-05-05",
        "num_of_posts": "1",
        "include_profile_data": "true"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script does 4 things:

  1. Starts a scrape job with the Facebook Page Posts Extract by URL scraper
  2. Polls the job every 5 seconds
  3. Downloads the result as JSON
  4. Saves the file under output/facebook-page-posts-extract-by-url.json

For batch jobs, send multiple input objects in SCRAPER_INPUTS. Keep limit_per_input aligned with your test plan.

That setting prevents one request from returning more rows than expected. It also keeps test costs predictable while you validate storage and parsing.

Set limit_per_input low while you build the pipeline. Raise it only after the destination schema and duplicate handling pass a real test.

Example JSON output

This is a trimmed API response from the Page posts scraper.

[
  {
    "inputs": {
      "url": "https://www.facebook.com/Beyonce/",
      "end_date": "2026-05-13",
      "start_date": "2026-05-05",
      "num_of_posts": "1",
      "include_profile_data": "true"
    },
    "scrape_status": "success",
    "url": "https://www.facebook.com/beyonce/posts/pfbid0JNMeKpFe4jtmzJFsbpsZUfuVAK7KmszmsnGbGCTm21FVX3pNJN23m94X54z27rXMl",
    "post_id": "1567622508057376",
    "user_url": "https://www.facebook.com/beyonce",
    "user_username_raw": "Beyoncé",
    "date_posted": "2026-05-05T01:10:44.000Z",
    "num_comments": 4646,
    "num_shares": 11665,
    "num_likes_type": {
      "type": "Like",
      "num": 105598
    },
    "page_name": "Beyoncé",
    "profile_id": "100044289256798",
    "page_intro": "⠀",
    "page_category": "Musician/band",
    "page_logo": "https://scontent.fvga2-6.fna.fbcdn.net/v/t39.30808-1/476181735_1202758777877086_4311767465452032132_n.jpg?stp=cp6_dst-jpg_s200x200_tt6&_nc_cat=104&ccb=1-7&_nc_sid=2d3e12&_nc_ohc=m3u00rOGzMIQ7kNvwHX9ERe&_nc_oc=Adrv2NxB40zO35h0UB-hRxSluqGH8BibaqK6Tzt2zs2Ur0Fh6-q4vXh8khZNayoqApo&_nc_zt=24&_nc_ht=scontent.fvga2-6.fna&_nc_gid=ApwmWYSGR_DTRi3SVepsUQ&_nc_ss=79289&oh=00_Af7TgNPepzoTzWBKqgzO3msRdbH3Gyezo7S3wXxOD4cVKA&oe=6A0A037C",
    "page_external_website": "beyonce.com",
    "page_followers": 55000000,
    "page_is_verified": true,
    "attachments": [
      {
        "id": "1567622471390713",
        "type": "Photo",
        "url": "https://scontent.fvga2-5.fna.fbcdn.net/v/t51.82787-15/686510170_18636374017008035_4701759044736133075_n.jpg?_nc_cat=1&ccb=1-7&_nc_sid=127cfc&_nc_ohc=2DQDpSuQOxgQ7kNvwGyq43E&_nc_oc=AdrINbSI9K7H0mUUVbABUOBOB3IdK0n4A1v5H2RW1fksX6oyZlcci3WfjEieQ5V3gzs&_nc_zt=23&_nc_ht=scontent.fvga2-5.fna&_nc_gid=iB51MnvaouT_OJ0rCsASHg&_nc_ss=79289&oh=00_Af63LeDRCPLvgWqSQjq4JE4es7GbCJP4M-LCXU9DZ-e-ig&oe=6A0A1E3B",
        "attachment_url": "https://www.facebook.com/photo.php?fbid=1567622471390713&set=a.287183736101266&type=3",
        "video_url": null,
        "thumbnail_url": null
      },
      {
        "id": "1567622481390712",
        "type": "Photo",
        "url": "https://scontent.fvga2-3.fna.fbcdn.net/v/t51.82787-15/684961468_18636374026008035_4971504822502366252_n.jpg?_nc_cat=108&ccb=1-7&_nc_sid=127cfc&_nc_ohc=6sQsDqAZheEQ7kNvwG7GQZw&_nc_oc=Adohw1I1QD4eECu0VuiQ478huUn5qiOpBpz0JITWh30jcLmDgDT5jwdwvT0VfeRtRvE&_nc_zt=23&_nc_ht=scontent.fvga2-3.fna&_nc_gid=iB51MnvaouT_OJ0rCsASHg&_nc_ss=79289&oh=00_Af5aTmiZB1utQNGQh3YHQ2ap62o8SWMZtMlC-sQNizF_SA&oe=6A0A0B94",
        "attachment_url": "https://www.facebook.com/photo.php?fbid=1567622481390712&set=a.287183736101266&type=3",
        "video_url": null,
        "thumbnail_url": null
      },
      {
        "id": "1567622478057379",
        "type": "Photo",
        "url": "https://scontent.fvga2-2.fna.fbcdn.net/v/t51.82787-15/689221710_18636374038008035_9039654601268904565_n.jpg?...truncated",
        "attachment_url": "https://www.facebook.com/photo.php?fbid=1567622478057379&set=a.287183736101266&type=3",
        "video_url": null,
        "thumbnail_url": null
      }
    ]
  }
]

Pick the right Facebook scraper

Facebook Page Post Scraper scraper selection
How the Facebook Page Post Scraper routes each input type to the right scraper.

Use the Page posts scraper when your input is a Page URL and your output target is post-level data.

Use Extract Facebook event data for event URLs. Use Get Marketplace listing data for Marketplace listing URLs.

Use Search Marketplace listings for keyword search across Facebook Marketplace. Each scraper expects a different input shape.

Scraper Input Output target
Extract Facebook page posts Page URL Posts, reactions, shares, attachments, page profile fields
Extract Facebook event data Event URL Event title, date, venue, description
Search Marketplace listings Keyword Listings, prices, locations, sellers

Browse all 86+ scrapers across Facebook, Instagram, TikTok, LinkedIn, Amazon, and Google.

Choose the narrowest scraper that matches your input type. Narrow extractors return cleaner records because each schema maps to one Facebook object type.

This choice reduces downstream branching. Your parser can expect one object model instead of handling events, listings, and posts in one table.

A narrow scraper also gives support teams clearer reproduction steps. One URL, one schema, and one output target shorten incident review.

What data you get back

Facebook Page Post Scraper field groupings
Facebook Page Post Scraper output fields grouped by category.

The response gives you the original input, scrape status, post fields, engagement counts, page profile fields, and attachments.

Treat the response as source data. Save the raw JSON before parsing it into typed tables.

Job and input fields

inputs echoes the URL, dates, post count, and profile-data flag you sent. Store this object with the result row.

The echoed input gives you an audit trail when you run 100 or 10,000 Page URLs. It also lets you replay one failed record without rebuilding the batch.

scrape_status tells you whether the row succeeded. Treat any status other than success as a failed row in your pipeline.

Log inputs and scrape_status together. That pairing gives support, QA, and data engineering teams enough context to replay failed records.

Store the job ID beside these fields. A job ID gives ScrapeNow support a direct lookup when you need help with a failed run.

Post identity fields

url is the canonical post URL returned by the scraper. Use it as the human-readable source link in exports and dashboards.

post_id is the stable post identifier. Use post_id plus user_url as your dedupe key.

user_url and user_username_raw identify the Page that published the post. In the sample, the Page URL is https://www.facebook.com/beyonce.

The display name in the sample is Beyoncé. Keep both identifiers because display names change.

URLs and IDs give your joins a stable base. They also make cross-run comparisons safer when a Page changes its public name.

Keep the raw post URL even when you store a normalized ID. Analysts still need a clickable source during QA and reporting reviews.

Timing fields

date_posted uses an ISO timestamp. The sample value is 2026-05-05T01:10:44.000Z.

Store it as a timestamp in your database. Then derive date, hour, week, or month fields downstream.

Store more than the date when you compare post timing. Hour-level timing matters for campaign reports, news monitoring, and posting cadence analysis.

Normalize timestamps to UTC in storage. Convert them to a local timezone only in reporting layers.

Add a separate scraped_at timestamp during ingestion. date_posted describes the post, and scraped_at describes your collection time.

Engagement fields

num_comments returns the comment count. In the sample, that value is 4646.

num_shares returns the share count. In the sample, that value is 11665.

num_likes_type returns the reaction type and count. The sample returns 105598 likes.

Engagement counts change after the scrape runs. Store the scrape time with each row if you compare counts across collection windows.

A row scraped on Monday and a row scraped on Friday describe different points in time. Keep that difference visible in your schema.

Use the latest row for current dashboards. Use historical rows when you need growth curves or post-performance snapshots.

Page profile fields

When include_profile_data is "true", the scraper returns page-level fields with the post when those fields are available.

Useful fields include:

  • page_name
  • profile_id
  • page_intro
  • page_category
  • page_logo
  • page_external_website
  • page_followers
  • page_is_verified

The sample page has 55000000 followers and page_is_verified set to true.

Profile fields describe the Page at scrape time. Store them separately if you track Page metadata changes over weeks or months.

For example, a Page can change its category, website, or logo while old posts stay the same. A separate Page table keeps that history readable.

Use profile_id as the Page table key when available. Keep user_url as a secondary key for joins and manual lookup.

Attachment fields

attachments is an array. A single post can have 0, 1, or many attachments.

Each attachment can include:

  • id
  • type
  • url
  • attachment_url
  • video_url
  • thumbnail_url

For photo posts, type is usually Photo. The url field points to the media asset.

The attachment_url field points to the Facebook photo page. Store both when analysts need source links and media URLs.

For video posts, check video_url and thumbnail_url. Keep null handling explicit because Facebook posts mix text, photo, video, link, and album formats.

Do not assume a media field exists because a post has attachments. Some link previews and shared content return partial media metadata.

Store attachment records in the same ingestion run as the parent post. That keeps post counts and media counts aligned during backfills.

Ready to get this data? Extract Facebook page posts.

Production tips

Run one Page through the full path before scaling a batch.

Validate inputs before you spend credits

Check the URL prefix, date format, and post count before calling the API.

from datetime import datetime
from urllib.parse import urlparse

def validate_facebook_page_input(item: dict) -> list[str]:
    errors = []

    url = item.get("url", "")
    parsed = urlparse(url)

    if parsed.scheme != "https" or parsed.netloc != "www.facebook.com":
        errors.append("url must start with https://www.facebook.com")

    for field in ("start_date", "end_date"):
        value = item.get(field)
        if value:
            try:
                datetime.strptime(value, "%Y-%m-%d")
            except ValueError:
                errors.append(f"{field} must use YYYY-MM-DD")

    num_of_posts = item.get("num_of_posts")
    if num_of_posts:
        try:
            if int(num_of_posts) < 1:
                errors.append("num_of_posts must be at least 1")
        except ValueError:
            errors.append("num_of_posts must be numeric")

    include_profile_data = item.get("include_profile_data")
    if include_profile_data not in ("true", "false", True, False):
        errors.append("include_profile_data must be true or false")

    return errors


payload = {
    "url": "https://www.facebook.com/Beyonce/",
    "start_date": "2026-05-05",
    "end_date": "2026-05-13",
    "num_of_posts": "1",
    "include_profile_data": "true"
}

errors = validate_facebook_page_input(payload)
if errors:
    raise ValueError(errors)

Add this validation before your job runner submits inputs.

Deduplicate on post_id and user_url

Use post_id as the main key. Add user_url if you want a compound key that is easier to inspect.

import json

with open("output/facebook-page-posts-extract-by-url.json", "r", encoding="utf-8") as f:
    rows = json.load(f)

seen = set()
deduped = []

for row in rows:
    post_id = row.get("post_id")
    user_url = row.get("user_url")
    key = (user_url, post_id)

    if not post_id:
        continue

    if key in seen:
        continue

    seen.add(key)
    deduped.append(row)

print(f"Input rows: {len(rows)}")
print(f"Unique rows: {len(deduped)}")

Deduplication matters when date ranges overlap or when you rerun failed batches.

Store attachments in a child table

Avoid flattening attachment arrays into attachment_1_url, attachment_2_url, and attachment_3_url. That structure fails as soon as a post has 4 images.

Use one table for posts and one table for attachments.

Table Primary key Fields
facebook_posts user_url, post_id URL, date, comments, shares, likes, page fields
facebook_post_attachments post_id, attachment_id Type, media URL, attachment URL, video URL, thumbnail URL

This schema keeps your post table stable while media counts change per post.

It also lets analysts count photos, videos, and thumbnails without parsing JSON arrays in every query.

Add an attachment_index column if ordering matters. Albums and carousels often need position data for downstream review.

Keep attachment_id nullable only if your ingestion process creates a fallback key. A safe fallback is (post_id, attachment_index).

Treat failed jobs as retryable

Cap retries at 2 or 3 attempts so one invalid Page URL does not block the batch.

def should_retry(status: str, attempts: int) -> bool:
    if status == "completed":
        return False

    if attempts >= 3:
        return False

    return True

Keep raw JSON

Save raw results before transforming them. Raw JSON gives you a replay path when you add a new column later. Add a scraped_at timestamp to every raw file or row.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Related articles

View all

Start collecting data in under five minutes.

Free credits included - no credit card required.

Free credits included - no credit card required