How to scrape X (Twitter) posts

The X posts scraper extracts a single X post by URL. It returns text, author data, media, engagement counts, quoted post metadata, and profile fields.

Data teams use it for social listening, creator tracking, post audits, and post-level analytics. ScrapeNow handles the X scraping stack, request retries, browser behavior, and parsing layer.

How to use this scraper

The X Posts Scraper job pipeline, from input to stored output.

ScrapeNow’s Extract X post data scraper takes one input field, the post URL. You can run it from the dashboard or call it through the API with the scraper slug the X Posts Extract by URL scraper.

Use this scraper when you already have status URLs from monitoring tools, exports, alerts, saved lists, or another X collection workflow. The scraper expects direct post URLs, not search terms or profile handles.

Step 1. Find the X post URL

Open x.com.

X search dropdown suggesting music, Apple Music and BET Music — Type your keyword in the X search bar to find posts to extract

On the Explore page, search for an account or keyword, such as `music`.

X search results highlighting Emotion and Music guitar solo post — Review search results and select the post you want to extract

On the results page, click the post you want to scrape.

X Emotion and Music post detail page with status URL highlighted — Copy the post URL from the address bar for use as scraper input

Copy the URL from the address bar. It should look like this:

https://x.com/taylorswift13/status/2019758757723422893

Use the full status URL. The scraper reads the post ID from the URL and returns the post plus related author fields.

Remove tracking parameters before you store the URL. A copied X link often includes ref_src, s, or other query parameters that create duplicate inputs.

Step 2. Run the scraper with the API

Use this Python script with Python 3.10 or newer. Replace YOUR_API_KEY with your ScrapeNow API key.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the input dicts for that scraper.
    - Set API_KEY to your Scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "x.com-posts-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://x.com/taylorswift13/status/2019758757723422893"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script starts a scraping job, polls every 5 seconds, waits up to 3600 seconds, and saves the result as JSON. It also creates an output directory before writing the file.

For batches, add more objects to SCRAPER_INPUTS:

SCRAPER_INPUTS = [
    {"url": "https://x.com/taylorswift13/status/2019758757723422893"},
    {"url": "https://x.com/username/status/1234567890123456789"}
]

Use the same input shape for 1 URL or 10,000 URLs. Each returned post maps to one result row.

For large batches, keep the input list stable during retries. Store the original input URL, normalized URL, scrape status, and job ID together.

Step 3. Read the output

The API returns an array of result objects. This trimmed response shows the main fields returned by the X posts scraper:

[
  {
    "inputs": {
      "url": "https://x.com/taylorswift13/status/2019758757723422893"
    },
    "scrape_status": "success",
    "id": "2019758757723422893",
    "user_posted": "taylorswift13",
    "name": "Taylor Swift",
    "description": "My favorite part about writing is that first spark of an idea. It can happen at any time, for any reason. The idea for the Opalite music video crash landed into my imagination when I was doing promo for The Life of a Showgirl. I was a guest on one of my favorite shows, @TheGNShow. For those of you who aren’t familiar, it’s a UK late night show where Graham Norton (the insanely charismatic and lovable host) invites a random group of actors, entertainers, musicians, etc to be on his show and we all sit there and chat like it’s a dinner party. They even serve wine. Anyway. I remember thinking I got ridiculously lucky with the group I was paired with. Cillian Murphy, Domhnall Gleeson, Greta Lee, Jodie Turner-Smith, and @LewisCapaldi. All people whose work I’ve admired from afar. When we were all talking during the broadcast, Domhnall made a light hearted joke about wanting to be in one of my music videos. He’s Irish! He was joking! Except that in that moment during the interview, I was instantly struck with an *idea*. And so a week later he received an email script I’d written for the Opalite video, where he was playing the starring role. I had this thought that it would be wild if all of our fellow guests on the Graham Norton show that night, including Graham himself, could be a part of it too. Like a school group project but for adults and it isn’t mandatory. To my delight, everyone from the show made the effort to time travel back to the 90’s with us and help with this video. You might even recognize some friendly faces from The Eras Tour. I got to work with one of my favorite people in the world, Rodrigo Prieto, again! I had more fun than I ever imagined - Made new friends, metaphors, and fashion choices. It was an absolute thrill to create this story and these characters. Shot on film. The Opalite video is out now on Spotify & Apple Music.\n",
    "date_posted": "2026-02-06T13:03:07.000Z",
    "photos": [
      "https://pbs.twimg.com/media/HAef4F9XsAAuwrv.jpg",
      "https://pbs.twimg.com/media/HAef4F6WcAAYnXN.jpg",
      "https://pbs.twimg.com/media/HAef4F7WoAAFt21.jpg",
      "https://pbs.twimg.com/media/HAef4F8WAAAGLbA.jpg"
    ],
    "url": "https://x.com/taylorswift13/status/2019758757723422893",
    "quoted_post": {
      "photos": null,
      "videos": null
    },
    "tagged_users": null,
    "replies": 6323,
    "reposts": 43954,
    "likes": 198414,
    "views": 7984339,
    "external_url": null,
    "hashtags": null,
    "followers": 79831025,
    "biography": "And, baby, that’s show business for you. New album The Life of a Showgirl. Available Now ❤️‍🔥",
    "posts_count": 884,
    "profile_image_link": "https://pbs.twimg.com/profile_images/2019199672955387904/KoSJY5W-_normal.jpg",
    "fo": "... truncated"
  }
]

Use scrape_status before writing records to production tables. A completed job means the job ended, while each row still needs its own status check.

Engagement counts represent the values seen during the scrape. If you scrape the same post later, likes, replies, reposts, and views can change.

What data you get back

X Posts Scraper output schema — X Posts Scraper output fields grouped by category.

The response is built for post-level storage. You get the input URL, scrape status, post identity, author identity, content, media, engagement counts, and profile fields in one record.

Field	Type	What it gives you
`inputs.url`	string	The URL you submitted
`scrape_status`	string	`success` or the scrape result state
`id`	string	X post ID from the status URL
`user_posted`	string	Author username
`name`	string	Author display name
`description`	string	Full post text
`date_posted`	string	ISO timestamp
`photos`	array	Media image URLs attached to the post
`url`	string	Canonical post URL
`quoted_post`	object	Quoted post media fields when present
`tagged_users`	array or null	Users tagged in the post
`replies`	number	Reply count
`reposts`	number	Repost count
`likes`	number	Like count
`views`	number	View count
`external_url`	string or null	Outbound link in the post
`hashtags`	array or null	Hashtags in the post
`followers`	number	Author follower count
`biography`	string	Author bio
`posts_count`	number	Author post count
`profile_image_link`	string	Author profile image URL

Ready to get this data? Extract X post data.

Use id as the primary key for the post table. X status IDs stay stable across copied URLs, tracking parameters, and repeated scrapes.

Use user_posted as the foreign key into an author table if you also run ScrapeNow’s Get X profile data scraper. If you start with handles, the Look up X profiles by username scraper returns profile records from usernames.

Store engagement counts separately if you track post growth over time. A single post row works for audits, while a time-series table works better for trend tracking.

Production tips

X Posts Scraper URL normalization — How the X Posts Scraper normalizes inputs before deduplication.

Validate the fields you write, allow nullable fields where the schema allows them, and route failed rows outside your main insert path.

Validate input URLs before sending jobs

Reject invalid URLs before they hit the API. This saves credits and makes failed jobs faster to debug.

import re

POST_URL_RE = re.compile(r"^https://x\.com/[^/]+/status/\d+")


def validate_post_urls(urls: list[str]) -> list[dict]:
    valid_inputs = []

    for url in urls:
        clean_url = url.strip().split("?")[0]

        if not POST_URL_RE.match(clean_url):
            raise ValueError(f"Invalid X post URL: {url}")

        valid_inputs.append({"url": clean_url})

    return valid_inputs


SCRAPER_INPUTS = validate_post_urls([
    "https://x.com/taylorswift13/status/2019758757723422893?ref_src=twsrc"
])

Strip query strings before storage. Normalize twitter.com links to x.com.

def normalize_x_post_url(url: str) -> str:
    clean_url = url.strip().split("?")[0]
    return clean_url.replace("https://twitter.com/", "https://x.com/")

Deduplicate by post ID

Store one row per id.

def dedupe_results(results: list[dict]) -> list[dict]:
    seen = set()
    rows = []

    for row in results:
        post_id = row.get("id")
        if not post_id:
            continue

        if post_id in seen:
            continue

        seen.add(post_id)
        rows.append(row)

    return rows

Use id for idempotent upserts. If a post gets more likes or views later, update the engagement fields on the existing row.

For historical analytics, write each scrape to a separate metrics table. Use (post_id, scraped_at) as the key for engagement snapshots.

Split post fields from author fields

Store post data and author data in separate tables.

CREATE TABLE x_posts (
    id TEXT PRIMARY KEY,
    url TEXT NOT NULL,
    user_posted TEXT NOT NULL,
    description TEXT,
    date_posted TIMESTAMP,
    replies INTEGER,
    reposts INTEGER,
    likes INTEGER,
    views INTEGER,
    external_url TEXT,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE x_authors (
    username TEXT PRIMARY KEY,
    name TEXT,
    biography TEXT,
    followers INTEGER,
    posts_count INTEGER,
    profile_image_link TEXT,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This prevents repeated writes of author profile fields across every post row.

Track engagement history when counts matter

Post engagement changes after publication. Store snapshots if your reporting needs growth curves, hourly movement, or campaign summaries.

CREATE TABLE x_post_metrics (
    post_id TEXT NOT NULL,
    replies INTEGER,
    reposts INTEGER,
    likes INTEGER,
    views INTEGER,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (post_id, scraped_at)
);

Write one row to x_posts for the post body. Write one row to x_post_metrics for each scrape run.

Route failed rows into a retry queue

Check scrape_status for every result.

def split_success_and_retry(results: list[dict]) -> tuple[list[dict], list[dict]]:
    success = []
    retry = []

    for row in results:
        if row.get("scrape_status") == "success" and row.get("id"):
            success.append(row)
        else:
            retry.append(row.get("inputs", {}))

    return success, retry

Retry failed inputs with backoff. Move the URL to a dead-letter table after the final retry.

Keep nullable fields nullable

Fields like hashtags, tagged_users, external_url, and quoted_post.photos can be null. Store them as null values instead of empty strings.

CREATE TABLE x_post_media (
    post_id TEXT NOT NULL,
    media_type TEXT NOT NULL,
    media_url TEXT NOT NULL,
    position INTEGER NOT NULL,
    PRIMARY KEY (post_id, media_url)
);

Keep schema checks close to ingestion

Validate required fields before database writes.

REQUIRED_SUCCESS_FIELDS = ("id", "url", "user_posted", "scrape_status")


def validate_success_row(row: dict) -> None:
    missing = [field for field in REQUIRED_SUCCESS_FIELDS if not row.get(field)]

    if missing:
        raise ValueError(f"Missing fields on result row: {missing}")

Run this check after you split successful rows from retry rows.

Where this fits with other X scrapers

Use the post scraper when you already have post URLs. Use profile scrapers when your input is a username, profile URL, or account list.

Job	Input	Scraper
Extract one or more posts	X status URL	Extract X post data
Extract a known profile	X profile URL	Get X profile data
Find a profile from a handle	Username	Look up X profiles by username

A common pipeline starts with profile search, expands to profile extraction, then collects post URLs for post extraction. Keep each step separate so retries stay small and observable.

The full ScrapeNow catalog has 86+ pre-built scrapers across 14 platforms in the Browse all 86+ scrapers. For a broader X scraping setup, the X scraper guide covers account discovery, post collection, and run planning.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Run the Extract X post data scraper with one post URL first. Confirm the schema against your storage table, then batch 100 URLs.

After the first batch, dedupe by id, split post and author fields, and route failed rows to a retry queue. That gives you a production-ready path before you move to larger runs.