How to extract Amazon product data at scale

The Amazon global product scraper extracts 30+ fields from any Amazon product URL, including title, ASIN, brand, seller, price, availability, rating, review count, categories, images, variations, and Buy Box data.

How to use this scraper

The Amazon Global Product Scraper job pipeline, from input to stored output.

The product URL scraper takes one required input and one optional filter:

Input	Required	Type	Notes
`url`	Yes	string	Direct Amazon product URL. It must start with `https://www.amazon.com/`
`bought_past_month`	No	integer	Minimum recent purchase count during the last 30 days. Valid range is `0` to `1000000`

Use the Extract global Amazon product data scraper when you already have product URLs or ASIN-derived URLs. If you need product discovery first, start with Search Amazon products by keyword.

The normal flow has two jobs. Search scrapers build the URL list. The product scraper turns each known product page into a structured product record.

This split keeps your pipeline easier to debug. Search failures belong in one queue, and product extraction failures belong in another.

Get the product URL input

Open amazon.com.

Type your product keyword in the Amazon search bar to find a seed product
Search for a seed product, such as headphones.
Open the product page by clicking the product result.

Amazon search results with a product selected
Copy the URL from the browser address bar.

Amazon product page with the dp URL highlighted — Amazon product page URL copied from the browser address bar

A valid input looks like this:

{
  "url": "https://www.amazon.com/dp/B0CYZD22FB",
  "bought_past_month": 10
}

Short /dp/ASIN URLs work well because they remove tracking parameters. Amazon URLs with query strings also work when they still point to a valid product page on amazon.com.

Prefer canonical ASIN URLs for scheduled jobs. They make deduplication easier, reduce URL churn, and remove session parameters that add no value downstream.

Run the scraper through the API

Use this Python script. Replace YOUR_API_KEY, then run it with Python 3.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
from pathlib import Path

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "amazon-global-products-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.amazon.com/dp/B0CYZD22FB",
        "bought_past_month": 10
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The same API pattern works for the other scrapers in this group. Change the scraper slug and input values in the code for each scraper.

For batch jobs, send multiple input dictionaries in SCRAPER_INPUTS. Keep your batch size aligned with your retry plan.

A 500 URL batch is easier to replay than a 50,000 URL batch after an upstream data issue. Smaller batches also make failed inputs easier to isolate when a source feed includes deleted listings.

Use a job ID as the replay boundary. Store the job ID, input batch hash, run time, and result file path in your own run log.

Use brand search inputs when you need a storefront URL

The brand search scraper accepts a URL to a brand or seller page. The URL must start with https://www.amazon.com/.

Open amazon.com.

Search for a product keyword on Amazon to find your target brand seller
Search for the desired product, such as headphones.

Amazon search page with headphones query
Open a product from the target brand.

Amazon product page from the target brand
Click the seller link in the right panel below the Buy Now button.
Amazon opens the seller page. Copy the Seller ID from the URL after the seller= parameter.

Amazon seller page URL containing seller ID parameter
Build the URL using this format:

https://www.amazon.com/s?me=SELLER_ID

Example:

https://www.amazon.com/s?me=A11H2172ZZKORR

For URL-based product discovery, Search Amazon products by URL works well when you already have a category, storefront, or filtered Amazon results page. Use it for saved searches, category pages, and filtered result URLs that your team already tracks.

Keep the seller ID with every product URL returned from the discovery step. That gives you a clean join key when you compare seller coverage against the extracted product record.

How to search by keyword

Start with keyword-based discovery when you have a search term but no product URLs yet.

Open amazon.com and type your keyword into the search bar.

Typing a keyword into the Amazon search bar — Type your product keyword in the Amazon search bar for discovery

Amazon language menu open on the homepage — Type your product keyword into the search bar

Amazon returns a results page with product listings matching your query.

Amazon country and language selector on the homepage — Search results page with matching products

If you need results from a specific country, click the delivery location dropdown and select your target marketplace.

Amazon country selection page with Italy highlighted — Select your target country or delivery location

Amazon country page with Go to website button — Amazon updates to show results for the selected country

Amazon Italy homepage after switching marketplace — Filtered search results for the selected marketplace

Copy the search results URL from the address bar. This URL becomes the input for the keyword search scraper.

Amazon Italy headphones results with the results URL — Copy the search results URL from the address bar for scraper input

Save the query, country context, and result page URL with each batch. Those fields explain why a product entered your pipeline.

How to search by seller

Use seller search when your input starts from a seller profile or seller-specific Amazon page. This works well for marketplace monitoring, unauthorized seller checks, and catalog audits.

Search for any product on Amazon to find a seller you want to track.

Searching Sony on Amazon to find a seller — Search for any product on Amazon to locate a target seller

Search for a product sold by your target seller

Open any product listing and scroll down to find the seller information section.

Find the seller link below the Buy Now button

Amazon search bar suggesting headphones keywords — Click the seller name to open their profile

The seller profile page shows all products from that seller. Copy the seller ID from the URL.

Amazon Basics headphones in the search results — Seller profile page with their full product catalog

Amazon product page for a Sony headphone listing — Seller product listing ready for extraction

After seller search returns product URLs, run the product scraper for price, availability, Buy Box seller, and catalog fields. For unauthorized seller checks, compare the seller ID from the search against the Buy Box seller from extraction. A mismatch gives your monitoring job a concrete review target.

API response sample

A completed job returns an array of result objects. Each object includes the original input, scrape status, normalized product fields, pricing fields, seller fields, and variant data.

[
  {
    "inputs": {
      "url": "https://www.amazon.com/dp/B0CYZD22FB",
      "bought_past_month": 10
    },
    "scrape_status": "success",
    "title": "AILIHEN Kids Headphones Bulk 10-Pack for K-12 School Classroom, On-Ear Wired Headset with Microphone for Students Children with 93dB Volume Limited, 3.5mm Jack for Chromebooks Tablets Laptop Computer",
    "seller_name": "AILIHEN",
    "brand": "AILIHEN",
    "description": "About this item Safe Sound Protection(<93dB): The World Health Organisation (WHO) recommends 93dB as the maximum safe volume level for kids and teens during their daily use. Here we introduce AILIHEN wholesale headphones, specially designed for teenagers to a safe sound to prevent damage to their hearing in daily life Built-in Mic: The headphones come with a built-in microphone, making them a suitable choice during study or leisure time. They can chat easily with teachers, friends, and parents while they’re busy learning, or with friends and family during downtime Designed for Students: the on-ear headphones have an adjustable headband and lightweigt design that can adjust to a perfect fit. With soft memory-protein cushioned earmuffs and pillow soft headband for ultra comfort, minimizes the pressure on the ears while wearing Durable and Foldable: Premium build quality with tangle-free nylon fabric cables which can withstand pulling and tangling, the standard audio jack will be compatible with most 3.5mm enabled audio cables like cellphones, laptops, kindle, tablets and etc. The foldable design would make the headphones more portable and storage Stereo Sound: They feature dynamic 40mm drivers that deliver deep clear sound, with audio clarity that makes listening to music, playing a game, or watching a show a pure pleasure on home use or airplane travels › See more product details",
    "initial_price": 78.84,
    "currency": "USD",
    "availability": "In Stock",
    "reviews_count": 983,
    "categories": [
      "Electronics",
      "Headphones, Earbuds & Accessories",
      "Headphones & Earbuds",
      "On-Ear Headphones"
    ],
    "parent_asin": "B09N76B4RD",
    "asin": "B0CYZD22FB",
    "buybox_seller": "AILIHEN",
    "number_of_sellers": 1,
    "root_bs_rank": 19118,
    "answered_questions": 0,
    "domain": "https://www.amazon.com/",
    "images_count": 8,
    "url": "https://www.amazon.com/dp/B0CYZD22FB?th=1&psc=1",
    "video_count": 1,
    "image_url": "https://m.media-amazon.com/images/I/81HDB2yrP-L._AC_SL1500_.jpg",
    "item_weight": "2.01 Kilograms",
    "rating": 4.4,
    "seller_id": "A11H2172ZZKORR",
    "discount": "-5%",
    "model_number": "I35PACK",
    "manufacturer": "AILIHEN",
    "department": "Electronics",
    "plus_content": true,
    "video": false,
    "final_price_high": null,
    "final_price": 74.89,
    "variations": [
      {
        "name": "Multi Color",
        "asin": "B09TR1Y3MZ",
        "price": null,
        "currency": null,
        "unit": null,
        "unit_price": null
      }
    ]
  }
]

Treat the response as an event record from the scrape time. Prices, seller ownership, stock state, ratings, review counts, and Buy Box data change often.

Do not treat a product response as a permanent catalog truth. Store the scrape timestamp beside every row that can change.

What data you get back

Amazon Global Product Scraper output schema — Amazon Global Product Scraper output fields grouped by category.

The scraper returns enough product detail to build a catalog row, price history row, seller snapshot, or product monitoring event.

Field	Use it for
`scrape_status`	Separate successful records from failed inputs before loading data
`title`	Product catalog title and matching logic
`asin`	Primary product key for Amazon item-level records
`parent_asin`	Grouping variants under one parent listing
`brand`	Brand-level reporting and product grouping
`seller_name`	Seller display name from the listing
`seller_id`	Stable seller key, useful for joins
`buybox_seller`	Buy Box monitoring
`initial_price`	Pre-discount or listed price
`final_price`	Current purchasable price
`currency`	Price normalization across countries
`availability`	Stock state, such as `In Stock`
`rating`	Average star rating
`reviews_count`	Review volume at scrape time
`categories`	Product taxonomy path
`root_bs_rank`	Root category Best Sellers Rank
`images_count`	Media depth check
`image_url`	Main product image
`video_count`	Count of listing videos
`variations`	Variant ASINs, names, prices, and units

Ready to get this data? Extract global Amazon product data.

For review-level extraction, use Extract Amazon reviews after product extraction. For seller inventory checks, pair the product scraper with Get Amazon seller data.

Store fields by workload. Catalog systems usually need asin, parent_asin, title, brand, manufacturer, model_number, categories, and image_url.

Price monitoring jobs usually need asin, seller_id, buybox_seller, initial_price, final_price, currency, discount, and availability. Seller monitoring jobs usually need asin, seller_id, seller_name, buybox_seller, number_of_sellers, and availability.

Variation tracking needs a separate table. Child ASINs can carry different prices, colors, sizes, package counts, and stock states under one parent listing.

Production tips for clean product data

Amazon product data has inconsistent edges. Treat every record as semi-structured data, even when the scraper returns a stable schema.

Amazon changes page modules, seller widgets, and variation layouts across categories. Headphones, grocery items, apparel, and replacement parts expose different combinations of price, unit price, variant, and seller data.

Your loader should accept missing fields. A missing final_price_high value on a single-price product is normal, and a missing unit price on headphones is normal.

Validate inputs before sending jobs

Reject malformed URLs before you spend credits. The scraper expects URLs that start with https://www.amazon.com/.

from urllib.parse import urlparse

def validate_amazon_product_input(item: dict) -> tuple[bool, str | None]:
    url = item.get("url")
    bought_past_month = item.get("bought_past_month", 0)

    if not isinstance(url, str) or not url.startswith("https://www.amazon.com/"):
        return False, "url must start with https://www.amazon.com/"

    if not isinstance(bought_past_month, int):
        return False, "bought_past_month must be an integer"

    if bought_past_month < 0 or bought_past_month > 1000000:
        return False, "bought_past_month must be between 0 and 1000000"

    parsed = urlparse(url)
    if not parsed.netloc.endswith("amazon.com"):
        return False, "url must use amazon.com"

    return True, None


inputs = [
    {
        "url": "https://www.amazon.com/dp/B0CYZD22FB",
        "bought_past_month": 10
    }
]

valid_inputs = []
for item in inputs:
    ok, error = validate_amazon_product_input(item)
    if ok:
        valid_inputs.append(item)
    else:
        print(f"Skipping input: {error}")

Validate locally before large runs. Normalize URLs by removing marketing parameters like tag, ref, and psc when a clean ASIN URL is available.

Deduplicate by ASIN before loading

Use asin as the product-level key. Use parent_asin when you want variant groups.

import json
from pathlib import Path

def dedupe_products(records: list[dict]) -> list[dict]:
    seen = {}
    for record in records:
        if record.get("scrape_status") != "success":
            continue

        asin = record.get("asin")
        if not asin:
            continue

        seen[asin] = record

    return list(seen.values())


data = json.loads(Path("amazon-global-products-extract-by-url-output.json").read_text())
clean_records = dedupe_products(data)

print(f"Loaded {len(data)} raw records")
print(f"Kept {len(clean_records)} unique ASIN records")

If you scrape the same ASIN from multiple URLs, keep the newest record by scrape timestamp. Do not dedupe variation rows too early since a parent listing can contain multiple child ASINs with different prices and stock states.

Store failed records with the original input

Keep failures. Failed inputs show deleted listings, blocked product pages, malformed URLs, unavailable marketplace pages, and schema changes.

def partition_results(records: list[dict]) -> tuple[list[dict], list[dict]]:
    successes = []
    failures = []

    for record in records:
        if record.get("scrape_status") == "success":
            successes.append(record)
        else:
            failures.append({
                "inputs": record.get("inputs"),
                "scrape_status": record.get("scrape_status"),
                "raw": record
            })

    return successes, failures

Retry failures once. If a URL fails twice, send it to a dead-letter table. Store the error payload as JSON since schema changes often show up first in failed rows.

Track field freshness

Use scrape time as part of your record identity. Add a scraped_at timestamp and a source_job_id to each table.

Data type	Suggested storage pattern
Catalog fields	Latest row per ASIN plus daily snapshot
Price fields	Append-only event table
Availability	Append-only event table
Reviews and rating	Daily metric event
Buy Box seller	Append-only seller event
Variations	Latest row per child ASIN plus change history

Handle variant data as its own workload

Store each variation as a child row with the parent ASIN, child ASIN, name, price, currency, unit, and unit price. Tie the variation array to the same scrape time as the parent product record.

Set retry rules before the first large run

Retry once for transient failures. Send repeated failures to a dead-letter table with the input URL, error payload, job ID, and first failure time.

Which Amazon scraper to use

Amazon Global Product Scraper input routing — How the Amazon Global Product Scraper routes each input type to the right scraper.

Use the global product URL scraper when URL coverage matters more than search discovery. Use search scrapers when you need to build the product URL list first.

Job	Scraper
Extract one known product page	Extract global Amazon product data
Find products from a keyword	Search Amazon products by keyword
Extract details from known Amazon product URLs	Extract Amazon product data
Extract products from a results URL	Search Amazon products by URL
Pull review records for a product	Extract Amazon reviews
Pull seller data from seller URLs	Get Amazon seller data

The full scraper catalog is in the Browse all 86+ scrapers hub. It includes Amazon, Google, LinkedIn, TikTok, Instagram, Facebook, YouTube, Zillow, Indeed, Glassdoor, Flipkart, Crunchbase, Yelp, and X scrapers.

Pick the scraper from the shape of your input. A keyword belongs in a search scraper.

A product URL belongs in an extract scraper. A seller URL belongs in a seller scraper.

If your pipeline starts with search terms, run discovery first and store the returned URLs. If your pipeline starts with a product feed, skip discovery and send the product URLs straight to extraction.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Start with 10 product URLs. Run the script above against the Extract global Amazon product data scraper.

Inspect asin, final_price, availability, seller_id, and variations. Then load the valid records into your product table.

After the first test run, add input validation, ASIN deduplication, failure storage, and separate product and price tables. Those four pieces stop most production data issues before they reach your application.

For the second run, use a batch that matches your real workload. If your production feed has 5,000 URLs, test with 500 URLs before moving to the full set.

Keep the test output, run log, and loader logs together. That gives you a complete trail from input URL to structured product record.

Start collecting data in under five minutes.

Free credits included - no credit card required.

Start for free

Free credits included - no credit card required