Skip to main content
Blog

How to extract Amazon product data at scale

Amazon global product scraper for extracting 30+ Amazon fields from product URLs into clean catalog and pricing pipelines.

ScrapersAmazonJune 13, 2026
How to extract Amazon product data at scale

The Amazon global product scraper extracts 30+ fields from any Amazon product URL, including title, ASIN, brand, seller, price, availability, rating, review count, categories, images, variations, and Buy Box data.

How to use this scraper

Amazon Global Product Scraper job pipeline
The Amazon Global Product Scraper job pipeline, from input to stored output.

The product URL scraper takes one required input and one optional filter:

Input Required Type Notes
url Yes string Direct Amazon product URL. It must start with https://www.amazon.com/
bought_past_month No integer Minimum recent purchase count during the last 30 days. Valid range is 0 to 1000000

Use the Extract global Amazon product data scraper when you already have product URLs or ASIN-derived URLs. If you need product discovery first, start with Search Amazon products by keyword.

The normal flow has two jobs. Search scrapers build the URL list. The product scraper turns each known product page into a structured product record.

This split keeps your pipeline easier to debug. Search failures belong in one queue, and product extraction failures belong in another.

Get the product URL input

  1. Open amazon.com.

    Typing headphones into the Amazon search bar
    Type your product keyword in the Amazon search bar to find a seed product
  2. Search for a seed product, such as headphones.

  3. Open the product page by clicking the product result.

    Amazon headphones search results with a product highlighted
    Amazon search results with a product selected
  4. Copy the URL from the browser address bar.

Amazon product page with the dp URL highlighted
Amazon product page URL copied from the browser address bar
A valid input looks like this:
{
  "url": "https://www.amazon.com/dp/B0CYZD22FB",
  "bought_past_month": 10
}

Short /dp/ASIN URLs work well because they remove tracking parameters. Amazon URLs with query strings also work when they still point to a valid product page on amazon.com.

Prefer canonical ASIN URLs for scheduled jobs. They make deduplication easier, reduce URL churn, and remove session parameters that add no value downstream.

Run the scraper through the API

Use this Python script. Replace YOUR_API_KEY, then run it with Python 3.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
from pathlib import Path

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "amazon-global-products-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.amazon.com/dp/B0CYZD22FB",
        "bought_past_month": 10
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The same API pattern works for the other scrapers in this group. Change the scraper slug and input values in the code for each scraper.

For batch jobs, send multiple input dictionaries in SCRAPER_INPUTS. Keep your batch size aligned with your retry plan.

A 500 URL batch is easier to replay than a 50,000 URL batch after an upstream data issue. Smaller batches also make failed inputs easier to isolate when a source feed includes deleted listings.

Use a job ID as the replay boundary. Store the job ID, input batch hash, run time, and result file path in your own run log.

Use brand search inputs when you need a storefront URL

The brand search scraper accepts a URL to a brand or seller page. The URL must start with https://www.amazon.com/.

  1. Open amazon.com.

    Searching headphones on Amazon to find a brand
    Search for a product keyword on Amazon to find your target brand seller
  2. Search for the desired product, such as headphones.

    Amazon headphones results for finding a target brand
    Amazon search page with headphones query
  3. Open a product from the target brand.

    Amazon product page for the target brand
    Amazon product page from the target brand
  4. Click the seller link in the right panel below the Buy Now button.

  5. Amazon opens the seller page. Copy the Seller ID from the URL after the seller= parameter.

    Amazon seller storefront page with seller ID in URL
    Amazon seller page URL containing seller ID parameter
  6. Build the URL using this format:

https://www.amazon.com/s?me=SELLER_ID

Example:

https://www.amazon.com/s?me=A11H2172ZZKORR

For URL-based product discovery, Search Amazon products by URL works well when you already have a category, storefront, or filtered Amazon results page. Use it for saved searches, category pages, and filtered result URLs that your team already tracks.

Keep the seller ID with every product URL returned from the discovery step. That gives you a clean join key when you compare seller coverage against the extracted product record.

How to search by keyword

Start with keyword-based discovery when you have a search term but no product URLs yet.

Open amazon.com and type your keyword into the search bar.

Typing a keyword into the Amazon search bar
Type your product keyword in the Amazon search bar for discovery
Amazon language menu open on the homepage
Type your product keyword into the search bar

Amazon returns a results page with product listings matching your query.

Amazon country and language selector on the homepage
Search results page with matching products

If you need results from a specific country, click the delivery location dropdown and select your target marketplace.

Amazon country selection page with Italy highlighted
Select your target country or delivery location
Amazon country page with Go to website button
Amazon updates to show results for the selected country
Amazon Italy homepage after switching marketplace
Filtered search results for the selected marketplace

Copy the search results URL from the address bar. This URL becomes the input for the keyword search scraper.

Amazon Italy headphones results with the results URL
Copy the search results URL from the address bar for scraper input

Save the query, country context, and result page URL with each batch. Those fields explain why a product entered your pipeline.

How to search by seller

Use seller search when your input starts from a seller profile or seller-specific Amazon page. This works well for marketplace monitoring, unauthorized seller checks, and catalog audits.

Search for any product on Amazon to find a seller you want to track.

Searching Sony on Amazon to find a seller
Search for any product on Amazon to locate a target seller
Amazon Sony brand results page
Search for a product sold by your target seller

Open any product listing and scroll down to find the seller information section.

Sony brand store page on Amazon
Find the seller link below the Buy Now button
Amazon search bar suggesting headphones keywords
Click the seller name to open their profile

The seller profile page shows all products from that seller. Copy the seller ID from the URL.

Amazon Basics headphones in the search results
Seller profile page with their full product catalog
Amazon product page for a Sony headphone listing
Seller product listing ready for extraction

After seller search returns product URLs, run the product scraper for price, availability, Buy Box seller, and catalog fields. For unauthorized seller checks, compare the seller ID from the search against the Buy Box seller from extraction. A mismatch gives your monitoring job a concrete review target.

API response sample

A completed job returns an array of result objects. Each object includes the original input, scrape status, normalized product fields, pricing fields, seller fields, and variant data.

[
  {
    "inputs": {
      "url": "https://www.amazon.com/dp/B0CYZD22FB",
      "bought_past_month": 10
    },
    "scrape_status": "success",
    "title": "AILIHEN Kids Headphones Bulk 10-Pack for K-12 School Classroom, On-Ear Wired Headset with Microphone for Students Children with 93dB Volume Limited, 3.5mm Jack for Chromebooks Tablets Laptop Computer",
    "seller_name": "AILIHEN",
    "brand": "AILIHEN",
    "description": "About this item Safe Sound Protection(<93dB): The World Health Organisation (WHO) recommends 93dB as the maximum safe volume level for kids and teens during their daily use. Here we introduce AILIHEN wholesale headphones, specially designed for teenagers to a safe sound to prevent damage to their hearing in daily life Built-in Mic: The headphones come with a built-in microphone, making them a suitable choice during study or leisure time. They can chat easily with teachers, friends, and parents while they’re busy learning, or with friends and family during downtime Designed for Students: the on-ear headphones have an adjustable headband and lightweigt design that can adjust to a perfect fit. With soft memory-protein cushioned earmuffs and pillow soft headband for ultra comfort, minimizes the pressure on the ears while wearing Durable and Foldable: Premium build quality with tangle-free nylon fabric cables which can withstand pulling and tangling, the standard audio jack will be compatible with most 3.5mm enabled audio cables like cellphones, laptops, kindle, tablets and etc. The foldable design would make the headphones more portable and storage Stereo Sound: They feature dynamic 40mm drivers that deliver deep clear sound, with audio clarity that makes listening to music, playing a game, or watching a show a pure pleasure on home use or airplane travels › See more product details",
    "initial_price": 78.84,
    "currency": "USD",
    "availability": "In Stock",
    "reviews_count": 983,
    "categories": [
      "Electronics",
      "Headphones, Earbuds & Accessories",
      "Headphones & Earbuds",
      "On-Ear Headphones"
    ],
    "parent_asin": "B09N76B4RD",
    "asin": "B0CYZD22FB",
    "buybox_seller": "AILIHEN",
    "number_of_sellers": 1,
    "root_bs_rank": 19118,
    "answered_questions": 0,
    "domain": "https://www.amazon.com/",
    "images_count": 8,
    "url": "https://www.amazon.com/dp/B0CYZD22FB?th=1&psc=1",
    "video_count": 1,
    "image_url": "https://m.media-amazon.com/images/I/81HDB2yrP-L._AC_SL1500_.jpg",
    "item_weight": "2.01 Kilograms",
    "rating": 4.4,
    "seller_id": "A11H2172ZZKORR",
    "discount": "-5%",
    "model_number": "I35PACK",
    "manufacturer": "AILIHEN",
    "department": "Electronics",
    "plus_content": true,
    "video": false,
    "final_price_high": null,
    "final_price": 74.89,
    "variations": [
      {
        "name": "Multi Color",
        "asin": "B09TR1Y3MZ",
        "price": null,
        "currency": null,
        "unit": null,
        "unit_price": null
      }
    ]
  }
]

Treat the response as an event record from the scrape time. Prices, seller ownership, stock state, ratings, review counts, and Buy Box data change often.

Do not treat a product response as a permanent catalog truth. Store the scrape timestamp beside every row that can change.

What data you get back

Amazon Global Product Scraper output schema
Amazon Global Product Scraper output fields grouped by category.

The scraper returns enough product detail to build a catalog row, price history row, seller snapshot, or product monitoring event.

Field Use it for
scrape_status Separate successful records from failed inputs before loading data
title Product catalog title and matching logic
asin Primary product key for Amazon item-level records
parent_asin Grouping variants under one parent listing
brand Brand-level reporting and product grouping
seller_name Seller display name from the listing
seller_id Stable seller key, useful for joins
buybox_seller Buy Box monitoring
initial_price Pre-discount or listed price
final_price Current purchasable price
currency Price normalization across countries
availability Stock state, such as In Stock
rating Average star rating
reviews_count Review volume at scrape time
categories Product taxonomy path
root_bs_rank Root category Best Sellers Rank
images_count Media depth check
image_url Main product image
video_count Count of listing videos
variations Variant ASINs, names, prices, and units

Ready to get this data? Extract global Amazon product data.

For review-level extraction, use Extract Amazon reviews after product extraction. For seller inventory checks, pair the product scraper with Get Amazon seller data.

Store fields by workload. Catalog systems usually need asin, parent_asin, title, brand, manufacturer, model_number, categories, and image_url.

Price monitoring jobs usually need asin, seller_id, buybox_seller, initial_price, final_price, currency, discount, and availability. Seller monitoring jobs usually need asin, seller_id, seller_name, buybox_seller, number_of_sellers, and availability.

Variation tracking needs a separate table. Child ASINs can carry different prices, colors, sizes, package counts, and stock states under one parent listing.

Production tips for clean product data

Amazon product data has inconsistent edges. Treat every record as semi-structured data, even when the scraper returns a stable schema.

Amazon changes page modules, seller widgets, and variation layouts across categories. Headphones, grocery items, apparel, and replacement parts expose different combinations of price, unit price, variant, and seller data.

Your loader should accept missing fields. A missing final_price_high value on a single-price product is normal, and a missing unit price on headphones is normal.

Validate inputs before sending jobs

Reject malformed URLs before you spend credits. The scraper expects URLs that start with https://www.amazon.com/.

from urllib.parse import urlparse

def validate_amazon_product_input(item: dict) -> tuple[bool, str | None]:
    url = item.get("url")
    bought_past_month = item.get("bought_past_month", 0)

    if not isinstance(url, str) or not url.startswith("https://www.amazon.com/"):
        return False, "url must start with https://www.amazon.com/"

    if not isinstance(bought_past_month, int):
        return False, "bought_past_month must be an integer"

    if bought_past_month < 0 or bought_past_month > 1000000:
        return False, "bought_past_month must be between 0 and 1000000"

    parsed = urlparse(url)
    if not parsed.netloc.endswith("amazon.com"):
        return False, "url must use amazon.com"

    return True, None


inputs = [
    {
        "url": "https://www.amazon.com/dp/B0CYZD22FB",
        "bought_past_month": 10
    }
]

valid_inputs = []
for item in inputs:
    ok, error = validate_amazon_product_input(item)
    if ok:
        valid_inputs.append(item)
    else:
        print(f"Skipping input: {error}")

Validate locally before large runs. Normalize URLs by removing marketing parameters like tag, ref, and psc when a clean ASIN URL is available.

Deduplicate by ASIN before loading

Use asin as the product-level key. Use parent_asin when you want variant groups.

import json
from pathlib import Path

def dedupe_products(records: list[dict]) -> list[dict]:
    seen = {}
    for record in records:
        if record.get("scrape_status") != "success":
            continue

        asin = record.get("asin")
        if not asin:
            continue

        seen[asin] = record

    return list(seen.values())


data = json.loads(Path("amazon-global-products-extract-by-url-output.json").read_text())
clean_records = dedupe_products(data)

print(f"Loaded {len(data)} raw records")
print(f"Kept {len(clean_records)} unique ASIN records")

If you scrape the same ASIN from multiple URLs, keep the newest record by scrape timestamp. Do not dedupe variation rows too early since a parent listing can contain multiple child ASINs with different prices and stock states.

Store failed records with the original input

Keep failures. Failed inputs show deleted listings, blocked product pages, malformed URLs, unavailable marketplace pages, and schema changes.

def partition_results(records: list[dict]) -> tuple[list[dict], list[dict]]:
    successes = []
    failures = []

    for record in records:
        if record.get("scrape_status") == "success":
            successes.append(record)
        else:
            failures.append({
                "inputs": record.get("inputs"),
                "scrape_status": record.get("scrape_status"),
                "raw": record
            })

    return successes, failures

Retry failures once. If a URL fails twice, send it to a dead-letter table. Store the error payload as JSON since schema changes often show up first in failed rows.

Track field freshness

Use scrape time as part of your record identity. Add a scraped_at timestamp and a source_job_id to each table.

Data type Suggested storage pattern
Catalog fields Latest row per ASIN plus daily snapshot
Price fields Append-only event table
Availability Append-only event table
Reviews and rating Daily metric event
Buy Box seller Append-only seller event
Variations Latest row per child ASIN plus change history

Handle variant data as its own workload

Store each variation as a child row with the parent ASIN, child ASIN, name, price, currency, unit, and unit price. Tie the variation array to the same scrape time as the parent product record.

Set retry rules before the first large run

Retry once for transient failures. Send repeated failures to a dead-letter table with the input URL, error payload, job ID, and first failure time.

Which Amazon scraper to use

Amazon Global Product Scraper input routing
How the Amazon Global Product Scraper routes each input type to the right scraper.

Use the global product URL scraper when URL coverage matters more than search discovery. Use search scrapers when you need to build the product URL list first.

Job Scraper
Extract one known product page Extract global Amazon product data
Find products from a keyword Search Amazon products by keyword
Extract details from known Amazon product URLs Extract Amazon product data
Extract products from a results URL Search Amazon products by URL
Pull review records for a product Extract Amazon reviews
Pull seller data from seller URLs Get Amazon seller data

The full scraper catalog is in the Browse all 86+ scrapers hub. It includes Amazon, Google, LinkedIn, TikTok, Instagram, Facebook, YouTube, Zillow, Indeed, Glassdoor, Flipkart, Crunchbase, Yelp, and X scrapers.

Pick the scraper from the shape of your input. A keyword belongs in a search scraper.

A product URL belongs in an extract scraper. A seller URL belongs in a seller scraper.

If your pipeline starts with search terms, run discovery first and store the returned URLs. If your pipeline starts with a product feed, skip discovery and send the product URLs straight to extraction.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Start with 10 product URLs. Run the script above against the Extract global Amazon product data scraper.

Inspect asin, final_price, availability, seller_id, and variations. Then load the valid records into your product table.

After the first test run, add input validation, ASIN deduplication, failure storage, and separate product and price tables. Those four pieces stop most production data issues before they reach your application.

For the second run, use a batch that matches your real workload. If your production feed has 5,000 URLs, test with 500 URLs before moving to the full set.

Keep the test output, run log, and loader logs together. That gives you a complete trail from input URL to structured product record.

Related articles

View all

Start collecting data in under five minutes.

Free credits included - no credit card required.

Free credits included - no credit card required