Skip to main content
Blog

How to scrape X (Twitter) posts

Use the X posts scraper to extract post data by URL, including text, media, engagement counts, and author fields.

ScrapersX (Twitter)June 4, 2026
How to scrape X (Twitter) posts

The X posts scraper extracts a single X post by URL. It returns text, author data, media, engagement counts, quoted post metadata, and profile fields.

Data teams use it for social listening, creator tracking, post audits, and post-level analytics. ScrapeNow handles the X scraping stack, request retries, browser behavior, and parsing layer.

How to use this scraper

X Posts Scraper job pipeline
The X Posts Scraper job pipeline, from input to stored output.

ScrapeNow’s Extract X post data scraper takes one input field, the post URL. You can run it from the dashboard or call it through the API with the scraper slug the X Posts Extract by URL scraper.

Use this scraper when you already have status URLs from monitoring tools, exports, alerts, saved lists, or another X collection workflow. The scraper expects direct post URLs, not search terms or profile handles.

Step 1. Find the X post URL

Open x.com.

X search dropdown suggesting music, Apple Music and BET Music
Type your keyword in the X search bar to find posts to extract
On the Explore page, search for an account or keyword, such as `music`.
X search results highlighting Emotion and Music guitar solo post
Review search results and select the post you want to extract
On the results page, click the post you want to scrape.
X Emotion and Music post detail page with status URL highlighted
Copy the post URL from the address bar for use as scraper input
Copy the URL from the address bar. It should look like this:
https://x.com/taylorswift13/status/2019758757723422893

Use the full status URL. The scraper reads the post ID from the URL and returns the post plus related author fields.

Remove tracking parameters before you store the URL. A copied X link often includes ref_src, s, or other query parameters that create duplicate inputs.

Step 2. Run the scraper with the API

Use this Python script with Python 3.10 or newer. Replace YOUR_API_KEY with your ScrapeNow API key.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the input dicts for that scraper.
    - Set API_KEY to your Scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "x.com-posts-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://x.com/taylorswift13/status/2019758757723422893"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script starts a scraping job, polls every 5 seconds, waits up to 3600 seconds, and saves the result as JSON. It also creates an output directory before writing the file.

For batches, add more objects to SCRAPER_INPUTS:

SCRAPER_INPUTS = [
    {"url": "https://x.com/taylorswift13/status/2019758757723422893"},
    {"url": "https://x.com/username/status/1234567890123456789"}
]

Use the same input shape for 1 URL or 10,000 URLs. Each returned post maps to one result row.

For large batches, keep the input list stable during retries. Store the original input URL, normalized URL, scrape status, and job ID together.

Step 3. Read the output

The API returns an array of result objects. This trimmed response shows the main fields returned by the X posts scraper:

[
  {
    "inputs": {
      "url": "https://x.com/taylorswift13/status/2019758757723422893"
    },
    "scrape_status": "success",
    "id": "2019758757723422893",
    "user_posted": "taylorswift13",
    "name": "Taylor Swift",
    "description": "My favorite part about writing is that first spark of an idea. It can happen at any time, for any reason. The idea for the Opalite music video crash landed into my imagination when I was doing promo for The Life of a Showgirl. I was a guest on one of my favorite shows, @TheGNShow. For those of you who aren’t familiar, it’s a UK late night show where Graham Norton (the insanely charismatic and lovable host) invites a random group of actors, entertainers, musicians, etc to be on his show and we all sit there and chat like it’s a dinner party. They even serve wine. Anyway. I remember thinking I got ridiculously lucky with the group I was paired with. Cillian Murphy, Domhnall Gleeson, Greta Lee, Jodie Turner-Smith, and @LewisCapaldi. All people whose work I’ve admired from afar. When we were all talking during the broadcast, Domhnall made a light hearted joke about wanting to be in one of my music videos. He’s Irish! He was joking! Except that in that moment during the interview, I was instantly struck with an *idea*. And so a week later he received an email script I’d written for the Opalite video, where he was playing the starring role. I had this thought that it would be wild if all of our fellow guests on the Graham Norton show that night, including Graham himself, could be a part of it too. Like a school group project but for adults and it isn’t mandatory. To my delight, everyone from the show made the effort to time travel back to the 90’s with us and help with this video. You might even recognize some friendly faces from The Eras Tour. I got to work with one of my favorite people in the world, Rodrigo Prieto, again! I had more fun than I ever imagined - Made new friends, metaphors, and fashion choices. It was an absolute thrill to create this story and these characters. Shot on film. The Opalite video is out now on Spotify & Apple Music.\n",
    "date_posted": "2026-02-06T13:03:07.000Z",
    "photos": [
      "https://pbs.twimg.com/media/HAef4F9XsAAuwrv.jpg",
      "https://pbs.twimg.com/media/HAef4F6WcAAYnXN.jpg",
      "https://pbs.twimg.com/media/HAef4F7WoAAFt21.jpg",
      "https://pbs.twimg.com/media/HAef4F8WAAAGLbA.jpg"
    ],
    "url": "https://x.com/taylorswift13/status/2019758757723422893",
    "quoted_post": {
      "photos": null,
      "videos": null
    },
    "tagged_users": null,
    "replies": 6323,
    "reposts": 43954,
    "likes": 198414,
    "views": 7984339,
    "external_url": null,
    "hashtags": null,
    "followers": 79831025,
    "biography": "And, baby, that’s show business for you. New album The Life of a Showgirl. Available Now ❤️‍🔥",
    "posts_count": 884,
    "profile_image_link": "https://pbs.twimg.com/profile_images/2019199672955387904/KoSJY5W-_normal.jpg",
    "fo": "... truncated"
  }
]

Use scrape_status before writing records to production tables. A completed job means the job ended, while each row still needs its own status check.

Engagement counts represent the values seen during the scrape. If you scrape the same post later, likes, replies, reposts, and views can change.

What data you get back

X Posts Scraper output schema
X Posts Scraper output fields grouped by category.

The response is built for post-level storage. You get the input URL, scrape status, post identity, author identity, content, media, engagement counts, and profile fields in one record.

Field Type What it gives you
inputs.url string The URL you submitted
scrape_status string success or the scrape result state
id string X post ID from the status URL
user_posted string Author username
name string Author display name
description string Full post text
date_posted string ISO timestamp
photos array Media image URLs attached to the post
url string Canonical post URL
quoted_post object Quoted post media fields when present
tagged_users array or null Users tagged in the post
replies number Reply count
reposts number Repost count
likes number Like count
views number View count
external_url string or null Outbound link in the post
hashtags array or null Hashtags in the post
followers number Author follower count
biography string Author bio
posts_count number Author post count
profile_image_link string Author profile image URL

Ready to get this data? Extract X post data.

Use id as the primary key for the post table. X status IDs stay stable across copied URLs, tracking parameters, and repeated scrapes.

Use user_posted as the foreign key into an author table if you also run ScrapeNow’s Get X profile data scraper. If you start with handles, the Look up X profiles by username scraper returns profile records from usernames.

Store engagement counts separately if you track post growth over time. A single post row works for audits, while a time-series table works better for trend tracking.

Production tips

X Posts Scraper URL normalization
How the X Posts Scraper normalizes inputs before deduplication.

Validate the fields you write, allow nullable fields where the schema allows them, and route failed rows outside your main insert path.

Validate input URLs before sending jobs

Reject invalid URLs before they hit the API. This saves credits and makes failed jobs faster to debug.

import re

POST_URL_RE = re.compile(r"^https://x\.com/[^/]+/status/\d+")


def validate_post_urls(urls: list[str]) -> list[dict]:
    valid_inputs = []

    for url in urls:
        clean_url = url.strip().split("?")[0]

        if not POST_URL_RE.match(clean_url):
            raise ValueError(f"Invalid X post URL: {url}")

        valid_inputs.append({"url": clean_url})

    return valid_inputs


SCRAPER_INPUTS = validate_post_urls([
    "https://x.com/taylorswift13/status/2019758757723422893?ref_src=twsrc"
])

Strip query strings before storage. Normalize twitter.com links to x.com.

def normalize_x_post_url(url: str) -> str:
    clean_url = url.strip().split("?")[0]
    return clean_url.replace("https://twitter.com/", "https://x.com/")

Deduplicate by post ID

Store one row per id.

def dedupe_results(results: list[dict]) -> list[dict]:
    seen = set()
    rows = []

    for row in results:
        post_id = row.get("id")
        if not post_id:
            continue

        if post_id in seen:
            continue

        seen.add(post_id)
        rows.append(row)

    return rows

Use id for idempotent upserts. If a post gets more likes or views later, update the engagement fields on the existing row.

For historical analytics, write each scrape to a separate metrics table. Use (post_id, scraped_at) as the key for engagement snapshots.

Split post fields from author fields

Store post data and author data in separate tables.

CREATE TABLE x_posts (
    id TEXT PRIMARY KEY,
    url TEXT NOT NULL,
    user_posted TEXT NOT NULL,
    description TEXT,
    date_posted TIMESTAMP,
    replies INTEGER,
    reposts INTEGER,
    likes INTEGER,
    views INTEGER,
    external_url TEXT,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE x_authors (
    username TEXT PRIMARY KEY,
    name TEXT,
    biography TEXT,
    followers INTEGER,
    posts_count INTEGER,
    profile_image_link TEXT,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This prevents repeated writes of author profile fields across every post row.

Track engagement history when counts matter

Post engagement changes after publication. Store snapshots if your reporting needs growth curves, hourly movement, or campaign summaries.

CREATE TABLE x_post_metrics (
    post_id TEXT NOT NULL,
    replies INTEGER,
    reposts INTEGER,
    likes INTEGER,
    views INTEGER,
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (post_id, scraped_at)
);

Write one row to x_posts for the post body. Write one row to x_post_metrics for each scrape run.

Route failed rows into a retry queue

Check scrape_status for every result.

def split_success_and_retry(results: list[dict]) -> tuple[list[dict], list[dict]]:
    success = []
    retry = []

    for row in results:
        if row.get("scrape_status") == "success" and row.get("id"):
            success.append(row)
        else:
            retry.append(row.get("inputs", {}))

    return success, retry

Retry failed inputs with backoff. Move the URL to a dead-letter table after the final retry.

Keep nullable fields nullable

Fields like hashtags, tagged_users, external_url, and quoted_post.photos can be null. Store them as null values instead of empty strings.

CREATE TABLE x_post_media (
    post_id TEXT NOT NULL,
    media_type TEXT NOT NULL,
    media_url TEXT NOT NULL,
    position INTEGER NOT NULL,
    PRIMARY KEY (post_id, media_url)
);

Keep schema checks close to ingestion

Validate required fields before database writes.

REQUIRED_SUCCESS_FIELDS = ("id", "url", "user_posted", "scrape_status")


def validate_success_row(row: dict) -> None:
    missing = [field for field in REQUIRED_SUCCESS_FIELDS if not row.get(field)]

    if missing:
        raise ValueError(f"Missing fields on result row: {missing}")

Run this check after you split successful rows from retry rows.

Where this fits with other X scrapers

Use the post scraper when you already have post URLs. Use profile scrapers when your input is a username, profile URL, or account list.

Job Input Scraper
Extract one or more posts X status URL Extract X post data
Extract a known profile X profile URL Get X profile data
Find a profile from a handle Username Look up X profiles by username

A common pipeline starts with profile search, expands to profile extraction, then collects post URLs for post extraction. Keep each step separate so retries stay small and observable.

The full ScrapeNow catalog has 86+ pre-built scrapers across 14 platforms in the Browse all 86+ scrapers. For a broader X scraping setup, the X scraper guide covers account discovery, post collection, and run planning.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Run the Extract X post data scraper with one post URL first. Confirm the schema against your storage table, then batch 100 URLs.

After the first batch, dedupe by id, split post and author fields, and route failed rows to a retry queue. That gives you a production-ready path before you move to larger runs.

Related articles

View all

Start collecting data in under five minutes.

Free credits included - no credit card required.

Free credits included - no credit card required