Skip to main content
Blog

How to extract Amazon product details at scale

Use the amazon products scraper to extract title, price, ASIN, availability, seller, and variations from Amazon detail pages.

ScrapersAmazonJune 11, 2026
How to extract Amazon product details at scale

Amazon product detail pages contain title, ASIN, brand, pricing, seller info, ratings, and variant data. The ScrapeNow Amazon products scraper extracts 20+ fields from each product URL and returns structured JSON.

The Amazon Products Scraper extracts title, brand, price, availability, ASIN, seller, rating, review count, images, variations, categories, and product metadata. It returns one structured row per product page, which is the shape most warehouses and monitoring jobs need.

Pricing teams use it to track competitor prices and Buy Box changes. Catalog teams use it to fill missing product attributes. Data engineers use it to collect product detail pages without maintaining selectors, proxy rotation, browser sessions, or parsing code.

Start with product URLs or ASIN-based URLs. If you need product discovery, run a search scraper first. Then send the returned product URLs into the product extraction scraper.

How to use this scraper

Amazon Products Scraper job pipeline
The Amazon Products Scraper job pipeline, from input to stored output.

Use Extract Amazon product data when you already have product URLs or ASIN-based product pages. This scraper reads the product detail page and returns one structured result per input URL.

Use Search Amazon products by keyword when you need products from a keyword like headphones. This scraper starts from Amazon search results and returns product URLs you can feed into the extract-by-URL scraper.

Use Search Amazon products by URL when you already have an Amazon search results URL. This works well when your team builds marketplace URLs with filters, category paths, or query parameters.

A typical production flow has three stages. Search returns product URLs, product extraction returns detail-page rows, and your warehouse stores typed fields plus the raw response.

Step 1. Get the product URL

Input variables for the Amazon Products Extract by URL scraper:

Field Required Example Notes
url Yes https://www.amazon.com/.../dp/B0BS1QCFHX Direct Amazon product URL. It must start with https://www.amazon.com/.
zipcode No 10001 ZIP or postal code for local product results.
language No EN Product listing language.

Open amazon.com.

Amazon search bar autocompleting headphones queries on homepage
Type your product keyword in the Amazon search bar to find listings
Search for a product. This example uses `headphones`.
Amazon headphones search results showing Amazon Basics noise cancelling models
Click a product from the search results to open its detail page
Open the product page by clicking a product result. Copy the URL from the browser address bar.
Amazon Basics Hybrid Active Noise Cancelling Headphones detail page
Amazon product detail page with URL in the address bar
For keyword search, the input payload is smaller. Pass `keyword`, then add `zipcode` when local delivery status matters.
Amazon homepage autocomplete for headphones keyword search by keyword
ScrapeNow keyword input panel for Amazon product search by keyword
Use [Search Amazon products by URL](/scraper/amazon-products-search-by-url) when you already have an Amazon search results URL. This tells ScrapeNow to parse a specific search results page instead of starting from a plain keyword.
Amazon search box suggesting headphones terms before building results URL
Type a product keyword in the Amazon search bar to start
Copy the search results URL after you set the marketplace, keyword, filters, and delivery location.
Amazon Mother's Day homepage with language and region selector open
Copy the full Amazon search URL from the browser address bar
The search-by-URL scraper keeps the query structure you send. That includes category filters, price filters, brand filters, and sort order when Amazon exposes them in the URL.
Amazon region panel with Change country/region link highlighted
Amazon search result page with filters applied
Choose the country or marketplace before copying the URL. Marketplace changes affect currency, language, delivery options, shipping promises, and seller availability.
Amazon country picker listing marketplaces from Australia to United States
Select the country or marketplace before copying the Amazon search URL
The examples below show a country selection flow. Use this only when your collection job needs that marketplace.
Amazon region dialog with Italy selected and Go to website button
Amazon country selection after choosing Italy
After changing country, reload the search page and confirm the URL still points to the search results you want.
Amazon Italy homepage showing Home and Garden Week deals
Amazon search results after country selection
Then paste that URL into the scraper input. Store the exact URL with the job so you can reproduce the run later.
Amazon Italy headphones results with pagination controls highlighted
Amazon search by URL input with selected marketplace
Store source URLs before the job starts. Raw URLs help you debug marketplace drift, tracking parameters, invalid ASINs, and local delivery changes.

Step 2. Run the scraper from Python

Set API_KEY, keep the scraper slug as the Amazon Products Extract by URL scraper, and replace the URL in SCRAPER_INPUTS.

The script below starts a scrape job, polls until the job finishes, downloads JSON results, and writes the file to output/amazon-products-extract-by-url.json.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "amazon-products-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.amazon.com/Sony-WH-CH720N-Canceling-Headphones-Microphone/dp/B0BS1QCFHX/ref=sr_1_1?dib=eyJ2IjoiMSJ9.gHAnVEZa5rFUsKuVZw3s1Ola0T-OmGjy0_VEjjd60LT5FCbyBj7mJw9_Td0RZWLEVhmlTVl3ZxR35rjHE3iizvpMM0KYRCPzXaaBcG3a7drPip0anuHhnZTSXrybn4bViygJ4kinpWsr7aDK8OatNgR1zKd5cgoSf3zoxtJaGu8rovNZ_TZWg8mhXqiqAp5GSnXpDIa3v4p-5QThqkn7dY64b1BcdIL9MrEA2J_ihJA.niQEc1mNoYBFUFkpSiKIIYDukhN6j3R4dqgA3q5RwBA&dib_tag=se&keywords=headphones&qid=1774097206&sr=8-1",
        "language": "EN",
        "zipcode": "10001"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The same API pattern works for the other scrapers in this group. Change the scraper slug and input values in the code for each scraper.

Keep batches small while you test. A batch of 5 URLs exposes schema issues faster than a batch of 5,000 URLs.

Use one product first when you wire the endpoint into a new pipeline. Check the status, response shape, file path, and downstream loader before increasing batch size.

Step 3. Read the output file

The script saves results to:

output/amazon-products-extract-by-url.json

A completed job returns one result object per input. If you submit 100 product URLs, expect up to 100 result objects back.

Each result includes the original input fields and the extracted Amazon product data. Keep both. The input fields tell you which URL, language, and ZIP code produced that row.

Do not throw away failed rows. They carry the original input and status, which you need for retry logic and audit trails.

API response sample

This is a trimmed response from the Amazon Products Extract by URL scraper.

[
  {
    "inputs": {
      "url": "https://www.amazon.com/Sony-WH-CH720N-Canceling-Headphones-Microphone/dp/B0BS1QCFHX/ref=sr_1_1?dib=eyJ2IjoiMSJ9.gHAnVEZa5rFUsKuVZw3s1Ola0T-OmGjy0_VEjjd60LT5FCbyBj7mJw9_Td0RZWLEVhmlTVl3ZxR35rjHE3iizvpMM0KYRCPzXaaBcG3a7drPip0anuHhnZTSXrybn4bViygJ4kinpWsr7aDK8OatNgR1zKd5cgoSf3zoxtJaGu8rovNZ_TZWg8mhXqiqAp5GSnXpDIa3v4p-5QThqkn7dY64b1BcdIL9MrEA2J_ihJA.niQEc1mNoYBFUFkpSiKIIYDukhN6j3R4dqgA3q5RwBA&dib_tag=se&keywords=headphones&qid=1774097206&sr=8-1",
      "language": "EN",
      "zipcode": "10001"
    },
    "scrape_status": "success",
    "title": "Sony WH-CH720N Noise Canceling Wireless Headphones Bluetooth Over The Ear Headset with Microphone and Alexa Built-in, Black New",
    "seller_name": "Amazon.com",
    "brand": "Sony",
    "description": "Turn down the world's noise with the long-lasting noise cancellation performance. Featuring Dual Noise Sensor technology and an Integrated Processor V1, the WH-CH720N allows you to fully immerse yourself in music without any distractions. Ergonomically designed to be lightweight, comfortable, and with up to 35 hours of battery life, you’ll almost forget you’re wearing it.",
    "initial_price": 179.99,
    "currency": "USD",
    "availability": "In Stock",
    "reviews_count": 15650,
    "categories": [
      "Electronics",
      "Headphones, Earbuds & Accessories",
      "Headphones & Earbuds",
      "Over-Ear Headphones"
    ],
    "parent_asin": "B0DJRY1KZT",
    "asin": "B0BS1QCFHX",
    "buybox_seller": "Amazon.com",
    "number_of_sellers": 1,
    "root_bs_rank": 412,
    "answered_questions": 0,
    "domain": "https://www.amazon.com/",
    "images_count": 13,
    "url": "https://www.amazon.com/Sony-WH-CH720N-Canceling-Headphones-Microphone/dp/B0BS1QCFHX?th=1&psc=1&language=en_US&currency=USD",
    "video_count": 7,
    "image_url": "https://m.media-amazon.com/images/I/51rpbVmi9XL._AC_SL1200_.jpg",
    "item_weight": "3.5 Ounces",
    "rating": 4.4,
    "seller_id": "ATVPDKIKX0DER",
    "image": "https://m.media-amazon.com/images/I/51rpbVmi9XL._AC_SL1200_.jpg",
    "model_number": "WHCH720N/B",
    "manufacturer": "SONY",
    "department": "Electronics",
    "plus_content": true,
    "upc": "027242925397",
    "video": true,
    "final_price_high": null,
    "variations": [
      {
        "name": "White",
        "asin": "B0BS74M665",
        "price": 149,
        "currency": "USD",
        "unit_price": null,
        "image": "https://m.media-amazon.com/images/I/31PFiCIw3WL._SS64_.jpg",
        "color": "White",
        "size": null
      },
      {
        "name": "Pink",
        "asin": "B0DY8TS92J",
        "price": 178,
        "currency": "USD",
        "unit_price": null,
        "image": "https://m.media-amazon.com/images/I/31wZqFBwjZL._SS64_.jpg",
        "color": "Pink",
        "size": null
      },
      {
        "name": "Black",
        "asin": "B0BS1QCFHX",
        "price": null,
        "currency": "USD",
        "unit_price": null,
        "image": "...",
        "color": "Black",
        "size": null
      }
    ]
  }
]

What data you get back, key fields in the API response

Amazon Products Scraper output schema
Amazon Products Scraper output fields grouped by category.

Each result includes inputs, scrape_status, and the extracted product fields. Store inputs in your warehouse because it ties the returned row back to the URL, language, and ZIP code you sent.

Treat scrape_status as a required field in downstream jobs. Load successful rows into product tables, and route failed rows into retry or review queues.

Plan the table shape before you schedule large runs. Product pages return identity fields, pricing fields, seller fields, media fields, and variation arrays.

Product identity fields

Use these fields as primary product identifiers:

Field Type Use
asin string Amazon item ID for the specific product page.
parent_asin string Parent product ID for variation groups.
title string Product title shown on the detail page.
brand string Brand name extracted from the page.
manufacturer string Manufacturer field when Amazon exposes it.
model_number string Model or part number.
upc string UPC when present.

For most pipelines, store asin and domain together as the unique key. The same ASIN can appear across marketplaces with different pricing, language, and availability.

Keep parent_asin when you monitor apparel, electronics, beauty, or home goods. Those categories often group color, size, count, and bundle options under one parent listing.

Keep model_number and upc when you match Amazon data to internal catalog records. ASINs work inside Amazon, while UPCs and model numbers help across retailers.

Price and availability fields

The price fields are numeric, so you can load them into Postgres, BigQuery, Snowflake, or a pandas dataframe without string cleanup.

Field Type Example
initial_price number 179.99
final_price_high number or null null
currency string USD
availability string In Stock

Use zipcode for local availability checks. The same product can show different delivery status by postal code.

Keep currency with every price. Amazon marketplaces can share similar product titles and ASINs, and currency mistakes break price comparison reports.

Store null prices as null. Replacing null with zero corrupts margin reports, alerting rules, and price history charts.

Seller and Buy Box fields

The seller fields tell you who owns the listing placement at scrape time.

Field Type Example
seller_name string Amazon.com
seller_id string ATVPDKIKX0DER
buybox_seller string Amazon.com
number_of_sellers number 1

Buy Box ownership changes during the day. If you monitor sellers, run scheduled jobs and store scraped_at with every row.

If you need seller-level extraction, pair product scraping with Get Amazon seller data. Product pages give seller signals, and seller pages give seller-specific profile data.

Track seller_id when it exists. Seller names can change, while seller IDs give you a better join key for historical reporting.

Reviews, ranking, and media fields

These fields work well for catalog scoring and product monitoring.

Field Type Example
rating number 4.4
reviews_count number 15650
answered_questions number 0
root_bs_rank number 412
images_count number 13
video_count number 7
plus_content boolean true

Use rating and review count for trend monitoring. A rating change from 4.6 to 4.2 matters more when review count grows at the same time.

For review text, ratings by reviewer, and review-level metadata, use Extract Amazon reviews after you collect product URLs.

Use images_count, video_count, and plus_content for listing quality checks. Thin media coverage often correlates with weaker conversion in catalog audits.

Variations

variations returns child ASINs for product options like color and size. Each variation object can include name, asin, price, currency, image, color, and size.

Treat variations as child rows in your database. A single product page can return 3, 10, or 50 variation records depending on the category.

Variation pricing can differ from the main product price. Store the variation ASIN and variation price instead of applying the parent price to every child row.

Do not flatten variations into comma-separated strings. That format breaks joins, price comparisons, and change detection.

A simple variation table should include parent ASIN, child ASIN, option fields, price, currency, image, and scraped_at. That shape keeps product rows and child rows separate.

Ready to get this data? Extract Amazon product data.

Production tips, validation, deduplication, schema, error handling

Amazon Products Scraper row ingestion logic
The Amazon Products Scraper row ingestion logic that shapes raw results into warehouse tables.

Validate inputs before you trigger jobs, dedupe by ASIN after extraction, and store scrape failures as rows.

Validate URLs before sending jobs

Reject non-Amazon URLs before they hit the API. This catches invalid queue messages and copied marketplace URLs that this scraper does not support.

from urllib.parse import urlparse

def validate_amazon_product_input(item: dict) -> None:
    url = item.get("url", "")
    parsed = urlparse(url)

    if parsed.scheme != "https":
        raise ValueError(f"URL must use https: {url}")

    if parsed.netloc != "www.amazon.com":
        raise ValueError(f"URL must start with https://www.amazon.com/: {url}")

    if "/dp/" not in parsed.path and "/gp/product/" not in parsed.path:
        raise ValueError(f"URL does not look like an Amazon product page: {url}")

    zipcode = item.get("zipcode")
    if zipcode is not None and not str(zipcode).strip():
        raise ValueError("zipcode cannot be blank")

    language = item.get("language")
    if language is not None and len(language) > 5:
        raise ValueError(f"language looks invalid: {language}")


for input_item in SCRAPER_INPUTS:
    validate_amazon_product_input(input_item)

Run this before trigger_scrape(). Invalid inputs should fail locally instead of waiting for a scrape job.

Dedupe by marketplace and ASIN

Amazon URLs contain noise. Search parameters, tracking parameters, and variation parameters can create multiple URLs for the same ASIN.

Use the returned domain and asin as your stable key.

import json

def dedupe_products(rows: list[dict]) -> list[dict]:
    seen = set()
    output = []

    for row in rows:
        if row.get("scrape_status") != "success":
            output.append(row)
            continue

        key = (row.get("domain"), row.get("asin"))

        if key in seen:
            continue

        seen.add(key)
        output.append(row)

    return output


with open("output/amazon-products-extract-by-url.json", "r", encoding="utf-8") as f:
    rows = json.load(f)

clean_rows = dedupe_products(rows)

with open("output/amazon-products-extract-by-url-deduped.json", "w", encoding="utf-8") as f:
    json.dump(clean_rows, f, indent=2, ensure_ascii=False)

Keep failed rows in the output for retry. Deduplicate after extraction since the returned ASIN is a better key than the raw input URL.

Store a typed schema

Do not store product data in one untyped JSON column unless you are prototyping. At minimum, split common fields into typed columns and keep the raw object for fields you do not query daily.

A practical product table shape looks like this:

Column Type
asin text
parent_asin text
domain text
url text
title text
brand text
seller_name text
buybox_seller text
initial_price numeric
currency text
availability text
rating numeric
reviews_count integer
root_bs_rank integer
images_count integer
video_count integer
scraped_at timestamp
raw_json jsonb

Store variations in a separate table keyed by parent_asin, asin, and scraped_at.

Use numeric columns for price, rating, rank, and counts. Keep raw_json for fields you have not modeled yet.

Add a scrape timestamp

Add scraped_at when you write rows, even if the API response includes timing metadata elsewhere. A timestamp in your warehouse makes price history, availability checks, and seller changes queryable.

from datetime import datetime, timezone

def add_scraped_at(rows: list[dict]) -> list[dict]:
    timestamp = datetime.now(timezone.utc).isoformat()

    for row in rows:
        row["scraped_at"] = timestamp

    return rows

Use UTC and create the timestamp once per batch so every row shares the same collection time.

Retry failed jobs with a cap

A failed job should enter a retry queue with a hard cap. I use 3 attempts for product detail pages, then write the failure to a dead-letter table.

import time

MAX_ATTEMPTS = 3

def run_with_retries(slug: str, inputs: list[dict]) -> dict:
    last_error = None

    for attempt in range(1, MAX_ATTEMPTS + 1):
        try:
            job_id = trigger_scrape(slug, inputs)
            status = poll_until_done(job_id)

            if status == "completed":
                return fetch_results(job_id)

            last_error = f"Job ended with status {status}"
        except requests.HTTPError as exc:
            last_error = str(exc)

        sleep_seconds = attempt * 10
        print(f"Attempt {attempt} failed. Retrying in {sleep_seconds}s")
        time.sleep(sleep_seconds)

    raise RuntimeError(f"Scrape failed after {MAX_ATTEMPTS} attempts: {last_error}")

Use small batches for retries so one invalid input does not block the run. Log the input payload, job ID, and error message with each attempt.

Monitor row counts and status rates

Track submitted inputs, successful rows, failed rows, and duplicate rows. Alert when row counts fall outside your expected range.

Which Amazon product scraper to use

ScrapeNow has different Amazon product scrapers because product discovery and product detail extraction are separate jobs.

Scraper Use it when Main input
Extract Amazon product data You already have product URLs and need detail-page data. url
Search Amazon products by keyword You need products returned for a keyword search. keyword
Search Amazon products by URL You already built or copied an Amazon search URL. url
Extract Amazon reviews You need review-level data for a product. product review URL
Get Amazon seller data You need seller profile data. seller URL

For a product monitoring pipeline, start with keyword search to discover ASINs. Then run product extract by URL on the product detail pages.

Add reviews and sellers only when you need those separate datasets. Review extraction adds row volume fast, and seller extraction belongs in its own table.

The broader ScrapeNow extractor catalog is at Browse all 86+ scrapers. It includes extractors for Amazon, Google, LinkedIn, TikTok, Instagram, YouTube, Zillow, Indeed, Glassdoor, Flipkart, Crunchbase, Yelp, Facebook, and X.

Use one scraper for one job. Mixing search results, product rows, reviews, and sellers in one table creates fragile reporting and expensive cleanup.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Related articles

View all

Start collecting data in under five minutes.

Free credits included - no credit card required.

Free credits included - no credit card required