How to scrape Amazon product, review, and seller data

Amazon blocks plain Python requests before most product pages load.

An Amazon scraper turns product titles, ASINs, prices, ratings, reviews, sellers, availability, and category ranks into rows your pipeline can store. That sounds simple until Amazon changes markup, hides a price block, returns a regional redirect, or sends a soft block that still returns HTTP 200.

Production Amazon scraping needs browser-grade requests, page-specific parsers, retries, and validation around every run. The data has value only when your job can tell the difference between a real empty result, a blocked page, and a product with no current offer.

Why teams scrape Amazon

Amazon data changes fast. Prices move daily, reviews arrive hourly on high-volume products, and sellers rotate through Buy Box positions throughout the day.

Most Amazon scraping jobs fall into five data groups:

Data type	Common use case	Typical input
Product search results	Track rankings, discover SKUs, monitor keyword coverage	Keyword or search URL
Product detail pages	Extract price, ASIN, title, rating, images, availability	Product URL
Reviews	Run sentiment analysis, track defects, measure launch feedback	Product URL
Sellers	Monitor marketplace competition and seller coverage	Product URL
Global search pages	Compare listings across Amazon marketplaces	Amazon search URL

Manual scraping with Python works for small tests. A script that pulls 50 pages from one marketplace can give you a quick sample.

Production traffic hits a different failure pattern. A 2026 Amazon scraping field report found that vanilla Python requests returned about 2% success on Amazon. Chrome-like TLS impersonation reached about 94% on the same workload.

That gap explains why teams stop maintaining raw HTML scrapers. Request fingerprints, retries, parsing changes, and marketplace behavior take more time than the extraction code.

What makes Amazon hard to scrape

Amazon fails in several ways. You get soft blocks, CAPTCHAs, empty result pages, regional redirects, hidden prices, and markup changes.

The blockers are predictable:

Challenge	What happens in production	What breaks
TLS fingerprinting	Amazon flags non-browser clients before page content loads	Python `requests`, basic HTTP clients
IP reputation	Repeated search traffic burns datacenter IPs faster	High-volume crawlers
Rate limits	Pages return incomplete or repeated data after bursts	Rank tracking jobs
Login walls	Reviews and seller views require browser-like behavior	Review extraction
Marketplace redirects	`.com`, `.co.uk`, `.de`, and other sites behave differently	Global product monitoring
DOM changes	Price, rating, and offer blocks move often	CSS selector scrapers
Localization	Currency, delivery promises, and availability depend on region	Price comparison jobs
Offer volatility	Sellers appear and disappear as inventory changes	Buy Box monitoring

The review layer tightened on 2024-11-05. Amazon reviews started moving behind login gates in more workflows, based on the same field report.

AWS WAF added native JA4 TLS fingerprinting on 2025-03-06. That change made low-level client fingerprints harder to fake across high-volume jobs.

These failures hit page types in different ways. Search pages, product pages, reviews, and seller pages need separate parsers, retry rules, and validation checks.

A search result parser cares about rank position, sponsored slots, pagination, and category filters. A product detail parser cares about title, price, availability, offer state, images, and variation metadata.

A review parser cares about sort order, pagination, reviewer fields, review dates, and verified purchase status. A seller parser cares about offer price, shipping terms, seller name, inventory state, and timestamp.

ScrapeNow splits Amazon extraction into separate scrapers for that reason. You send the input that matches the data you need, and the scraper runs the matching extraction path.

Raw Amazon scraping breaks fast

A basic Amazon scraper usually starts like this:

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/s?k=wireless+earbuds"

html = requests.get(
    url,
    headers={
        "User-Agent": "Mozilla/5.0"
    },
    timeout=30
).text

soup = BeautifulSoup(html, "html.parser")

for item in soup.select("[data-component-type='s-search-result']"):
    title = item.select_one("h2 span")
    asin = item.get("data-asin")
    price = item.select_one(".a-price .a-offscreen")

    print({
        "asin": asin,
        "title": title.get_text(strip=True) if title else None,
        "price": price.get_text(strip=True) if price else None
    })

This works until Amazon returns a different page shape. It also breaks when the request fingerprint fails before the parser runs.

The parser has no context for a CAPTCHA page, an empty search page, or a regional redirect. All three can produce an empty result set.

That creates silent data loss. Your pipeline sees zero rows and treats the keyword as empty.

A production job needs page classification before extraction. It should label responses as valid page, blocked page, redirect, unavailable product, empty search, or parser miss.

Here is the minimum behavior your job needs:

Step	Check	Why it matters
Fetch	Browser-like request fingerprint	Amazon blocks basic HTTP clients early
Classify	Detect block, redirect, empty page, valid page	Empty rows need a reason
Parse	Use page-specific extractors	Search, product, review, and seller pages differ
Validate	Check required fields and row counts	Missing price and missing product are different states
Retry	Retry only recoverable failures	Blind retries waste credits and IP reputation
Store	Save input, output, timestamp, marketplace	You need history for debugging and reports

ScrapeNow handles that path inside each Amazon scraper. Your application sends inputs and stores returned rows.

ScrapeNow's Amazon scrapers

ScrapeNow has Amazon scrapers for keyword search, product URLs, review URLs, seller data, and search URLs. Each scraper targets one Amazon page type.

That split keeps routing logic out of your application code. Your job sends a keyword, product URL, or search URL, then receives structured rows for that page type.

This also keeps validation simpler. A keyword job expects many product rows, while a product URL job usually expects one row per input.

Amazon Products Search by Keyword

The Search Amazon products by keyword takes a keyword like wireless earbuds or standing desk. It returns products from Amazon search results.

Typical output includes product title, ASIN, price, rating, review count, image URL, product URL, and search position. That data works for rank tracking, keyword monitoring, catalog discovery, and share-of-search reporting.

Use keyword search when the exact search term matters. standing desk and adjustable desk can return overlapping products with different rank positions.

A normal rank tracking pipeline stores these fields:

Field	Why you store it
Keyword	Keeps the ranking tied to the query
Marketplace	Separates `.com`, `.de`, `.co.uk`, and other sites
ASIN	Lets you join search rows to product detail rows
Position	Tracks organic movement over time
Price	Captures search-visible pricing
Timestamp	Makes rank changes measurable

Run keyword extraction first when you need discovery. Store ASINs from search results, then send selected product URLs to the product detail scraper.

The detailed Amazon products scraper guide with code examples covers keyword-based extraction, pagination behavior, and rank tracking output. It also shows how to store ASINs from search results for later product detail runs.

Amazon Products Extract by URL

The Extract Amazon product data takes direct product URLs. It returns structured product detail data for each listing.

Use it when you already have ASINs or product URLs. Common fields include current price, title, availability, rating, images, product metadata, and listing identifiers.

This scraper fits price tracking and catalog monitoring jobs. Feed it a list of known products, run it on a schedule, and compare snapshots.

A product detail job has tighter cost control than keyword search. Each input normally maps to one product row.

Direct URL extraction also gives cleaner history. You compare the same product across runs without adding search rank noise.

Use product URL extraction for workflows like these:

Workflow	Input	Output you care about
Price tracking	Known product URLs	Current price and availability
Catalog monitoring	ASIN list	Title, images, attributes, listing state
Competitor tracking	Competitor product URLs	Price, rating, reviews, offer state
Retail audit	Brand product list	Missing listings, unavailable products, stale content

The Amazon products scraper guide also covers product URL extraction. It explains when direct URLs give lower cost and cleaner data than keyword searches.

Amazon Reviews Extract by URL

The Extract Amazon reviews takes a product URL. It extracts review data tied to that listing.

Typical fields include review text, rating, reviewer name, review date, verified purchase status, and review title. Those fields support sentiment analysis, product defect tracking, launch monitoring, and competitor review mining.

Treat reviews as their own workload. Review pagination, login gates, sort order, and volume behave differently from product pages.

A product detail run returns one row per product URL. A review run can return hundreds or thousands of rows for one popular product.

Review jobs also need stronger deduplication. Store review ID when available, plus product URL, reviewer name, rating, date, title, and text hash.

That structure lets you detect new reviews without reprocessing the full history. It also lets you track edited reviews when the text hash changes.

Use reviews extraction when the row is the review itself. Use product extraction when you need only review count and rating summary.

Amazon Sellers Extract by URL

The Get Amazon seller data extracts seller data from product pages and offer views. It returns seller names, prices, shipping details, availability, and marketplace offer signals where Amazon exposes them.

Use this scraper when marketplace competition matters. It works for Buy Box monitoring, reseller tracking, unauthorized seller checks, and offer coverage analysis.

Seller pages change often because offers depend on inventory, shipping promises, location, and account-specific display logic. A seller extraction job needs repeat runs and consistent timestamps.

Store seller rows as event data. Each row should include product URL, ASIN, seller name, offer price, shipping cost, availability, marketplace, and observed time.

That schema makes offer movement queryable. You can see when a seller entered the listing, changed price, disappeared, or lost offer visibility.

The detailed Amazon sellers scraper guide with code examples covers seller extraction and offer monitoring. It also shows how to track marketplace competition across repeated runs.

Amazon Products Search by URL

The Search Amazon products by URL takes an Amazon search URL instead of a raw keyword. Use it when filters, category paths, sort order, or query parameters need to stay intact.

This matters for category-specific monitoring. A keyword alone cannot preserve every filter Amazon puts into a search URL.

Search URLs also help with marketplace-specific work. A .de search URL, a .co.uk search URL, and a .com search URL can produce different products, prices, and availability.

URL-based search is the right choice for filtered pages. Examples include category paths, brand filters, price ranges, Prime filters, sort orders, and department-specific searches.

Keep the full URL as the input key. Do not reduce it to the visible keyword, because Amazon often stores ranking context in query parameters.

The Amazon global product scraper guide covers URL-based search extraction across Amazon marketplaces. It explains how to keep marketplace context inside your input URLs.

Which Amazon scraper to use

Amazon Scraper input type routing — How the Amazon Scraper routes each input type to the right scraper.

Pick the scraper by input type. That keeps your pipeline simple and avoids converting URLs into keywords or keywords into URLs.

You have	You want	Use this scraper
A keyword	Search result products and rank positions	Amazon Products Search by Keyword
A product URL	Product details for one listing	Amazon Products Extract by URL
A product URL	Customer reviews for that listing	Amazon Reviews Extract by URL
A product URL	Marketplace sellers and offers	Amazon Sellers Extract by URL
A filtered search URL	Search results with Amazon filters preserved	Amazon Products Search by URL

For 10,000 keyword searches per day, start with keyword search extraction and store ASINs. Run product URL extraction as a second step for ASINs that matter.

For 50,000 product URLs, feed the product URLs directly. That avoids paying for search result rows you will discard.

For review mining, run reviews as a separate job. Review pages have different block patterns, pagination rules, and row volume.

For seller monitoring, store product URL, seller name, offer price, shipping terms, and timestamp on every run. That gives you a clean history of offer movement.

For global monitoring, keep marketplace in the input. Do not collapse amazon.com, amazon.de, and amazon.co.uk into a single normalized URL list.

Example Amazon scraping workflow

Amazon Scraper discovery to extraction loop — The Amazon Scraper discovery-to-extraction loop, from search results to product records.

A practical Amazon pipeline usually runs in two phases. Search extraction discovers products, then product URL extraction tracks the products that matter.

The job can look like this:

import requests

SCRAPENOW_API_KEY = "YOUR_API_KEY"

def run_scrapenow_scraper(scraper_slug, payload):
    response = requests.post(
        f"https://api.scrapenow.io/api/v1/scraping/scrape?scraper={scraper_slug}",
        headers={
            "Authorization": f"Bearer {SCRAPENOW_API_KEY}",
            "Content-Type": "application/json"
        },
        json=payload,
        timeout=120
    )
    response.raise_for_status()
    return response.json()

search_rows = run_scrapenow_scraper(
    "amazon-products-search-by-keyword",
    {
        "keywords": ["wireless earbuds", "standing desk"],
        "marketplace": "amazon.com"
    }
)

asin_urls = [
    row["product_url"]
    for row in search_rows["data"]
    if row.get("asin") and row.get("position", 999) <= 20
]

product_rows = run_scrapenow_scraper(
    "amazon-products-extract-by-url",
    {
        "urls": asin_urls
    }
)

for row in product_rows["data"]:
    print(row["asin"], row.get("price"), row.get("availability"))

Use this pattern when you need both discovery and monitoring. Search rows tell you which products rank, and product rows tell you what changed on each listing.

For scheduled jobs, store the raw input with every output row. Save the scraper slug, marketplace, timestamp, and run ID as separate columns.

That metadata pays for itself during debugging. When a report looks wrong, you can trace the exact input and scraper path that produced the row.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Build versus buy for Amazon scraping

DIY scraping makes sense for one-off checks, internal experiments, and low-volume research. A script that pulls 100 product pages once is fine.

Production Amazon scraping becomes a maintenance job. You deal with TLS fingerprints, rotating IPs, blocked sessions, parser changes, retries, marketplace redirects, and data validation.

Use this decision matrix:

Scenario	Better choice	Reason
One-time pull of 100 public product pages	DIY script	Low risk and low repeat cost
Daily price tracking for 5,000 ASINs	Product URL scraper	Direct inputs and predictable row counts
Keyword rank tracking across 200 search terms	Keyword search scraper	Position data belongs to search pages
Review mining for 1,000 products	Reviews scraper	Review pagination needs a separate path
Seller monitoring across marketplace offers	Sellers scraper	Offer rows change faster than product rows
Filtered category monitoring	Search URL scraper	URL filters must stay intact
Custom fields across multiple Amazon page types	Managed custom pipeline	Shared fields need cross-page joins

A DEV benchmark of large-scale Amazon scraping reported self-built scrapers at 71.4% product page success. The same benchmark reported about 60% of engineering time spent on anti-bot maintenance.

That matches what we see in production. Parser code is one part of the job, and anti-bot maintenance consumes the calendar.

The build path works when the data volume stays low and the field list stays stable. It also works when missed rows do not break downstream reports.

The buy path works when the job runs daily, feeds paid dashboards, or supports pricing decisions. At that point, failed runs cost more than scraper credits.

Use a simple test. If a blocked run creates an engineer ticket, treat Amazon scraping as production infrastructure.

Operational checks before you scale

Amazon Scraper response classification — How the Amazon Scraper classifies each API response before processing.

Start every Amazon scraping workflow with a small batch. Run 50 to 100 inputs, inspect the rows, and verify the fields your pipeline expects.

Check row counts before you schedule a full run. A search scraper returns many rows per keyword, while a product scraper usually returns one row per URL.

Store the raw input beside every returned row. That makes debugging easier when Amazon redirects a URL, removes a product, or changes availability.

Add timestamps to every run. Price, seller, and availability data lose value when you cannot tell when the scraper observed them.

Keep marketplace as a first-class field. A product can have the same ASIN across regions and different prices, sellers, delivery terms, and review counts.

Validate empty results separately from blocked results. An empty category page, unavailable product, and failed request require different handling.

Track parser health with field-level checks. For product pages, alert on missing ASIN, missing title, or a sudden drop in price coverage.

Track search health with row-count checks. If a keyword usually returns 200 rows and suddenly returns 12, classify the run before updating reports.

For review jobs, deduplicate rows before loading them into analysis tables. Review pages often overlap across pagination and sort modes.

For seller jobs, treat every run as a snapshot. Offer data changes often enough that overwriting old rows removes useful history.

Data model for Amazon scraper results

Store Amazon scraper output in separate tables by page type. Search rows, product rows, review rows, and seller rows have different grain.

A search result row represents one product at one position for one query. The natural key is keyword or search URL, marketplace, ASIN, position, and timestamp.

A product detail row represents one product observation. The natural key is product URL or ASIN, marketplace, and timestamp.

A review row represents one customer review. The natural key is review ID when available, or a hash of reviewer, date, title, rating, and text.

A seller row represents one offer observation. The natural key is ASIN, seller name, marketplace, offer price, shipping terms, and timestamp.

This separation keeps joins clean. It also prevents review volume from bloating product tables and seller volatility from overwriting product history.

A simple warehouse layout works well:

Table	Grain	Common fields
`amazon_search_results`	One product per query position	keyword, marketplace, asin, position, price, timestamp
`amazon_products`	One product per run	asin, url, title, price, rating, availability, timestamp
`amazon_reviews`	One review per row	asin, rating, title, text, reviewer, review_date
`amazon_sellers`	One offer per seller snapshot	asin, seller, offer_price, shipping, availability, timestamp

Keep the run ID in every table. It gives you one handle for retries, audits, and failed batch cleanup.

Next step

Choose the scraper that matches your input and run a small batch of 50 to 100 records. Use keyword search for discovery, product URL extraction for known ASINs, reviews extraction for review rows, sellers extraction for offers, and search URL extraction for filtered pages.

Check returned fields, row counts, marketplace behavior, and timestamps before you schedule larger jobs. Store the input, scraper type, run ID, and observed time with every row.

Start with the Amazon scraper page that matches your job: Search Amazon products by keyword, Extract Amazon product data, Extract Amazon reviews, Get Amazon seller data, or Search Amazon products by URL.