Skip to main content
Blog

How to scrape Flipkart reviews by SKU

Use the flipkart reviews scraper to extract ratings, text, author names, dates, and review links from a Flipkart SKU.

ScrapersFlipkartJune 27, 2026
How to scrape Flipkart reviews by SKU

Flipkart product reviews contain rating, review text, reviewer name, date, upvotes, and product context. The ScrapeNow Flipkart reviews scraper extracts all of these from a product SKU and returns structured JSON.

Where this scraper fits

Use the Flipkart Reviews Search by SKU scraper when you already have Flipkart product SKUs. If your pipeline starts earlier in the product discovery flow, use a product or category scraper first.

Starting point Scraper to use Output you get
You have a Flipkart SKU Extract Flipkart reviews by SKU Review rows for that product
You have a product URL Extract Flipkart product data Product metadata and identifiers
You have a category path Flipkart products by category Product list for a category
You have a search URL Search Flipkart results Product results from a search page

A production review pipeline usually has five steps:

  1. Extract products from category pages, search pages, or product URLs.
  2. Store each product SKU in your product table.
  3. Send those SKUs to the reviews scraper on a schedule.
  4. Normalize review rows into your warehouse.
  5. Deduplicate records by review_id.

The reviews scraper starts at step 3. It assumes your system already knows the product SKU and needs review data for that SKU.

How to use this scraper

Flipkart Reviews Scraper job pipeline
The Flipkart Reviews Scraper job pipeline, from input to stored output.

The ScrapeNow scraper slug is the Flipkart Reviews Search by SKU scraper. It takes one input field named sku.

The input must be the SKU value only. Send itm5309cefa035da, not a full Flipkart product URL.

A valid request input looks like this:

{
  "sku": "itm5309cefa035da"
}

Step 1. Find the Flipkart SKU

Open flipkart.com/search.

Flipkart search dropdown suggesting violin instrument categories above listings
Search for the product whose reviews you want to extract
Search for a product, such as `violin`.
Flipkart violin search results grid with KADENCE first listing highlighted
Click the product listing to open its detail page and find the SKU
Click the product you want from the search results.
Flipkart Kadence violin product page with SKU in URL
Extract the SKU from the product URL, starting with the itm prefix
Extract the SKU from the product page URL. The SKU starts with `itm` and appears as the last path parameter before optional query parameters.

Example SKU:

itm5309cefa035da

Use that value as the sku input. Keep only the SKU value.

For example, this URL contains the SKU in the product path:

https://www.flipkart.com/example-product/p/itm5309cefa035da?pid=ACGGX9M23RZ2UPZE

The scraper input should contain this value:

{
  "sku": "itm5309cefa035da"
}

If you already store Flipkart product URLs, extract the SKU once during ingestion. Store the SKU as its own column, then reuse it for review jobs.

Step 2. Run the API request

Install the dependency:

pip install requests

Then run this Python script with your ScrapeNow API key:

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "flipkart-reviews-search-by-sku"

SCRAPER_INPUTS = [
    {
        "sku": "itm5309cefa035da"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script performs these steps:

  1. Starts a scrape job with POST /scrape?scraper=flipkart-reviews-search-by-sku.
  2. Polls the job every 5 seconds.
  3. Stops after 3600 seconds if the job does not finish.
  4. Saves the JSON output to output/flipkart-reviews-search-by-sku.json.

Run the script:

python flipkart_reviews.py

For production runs, keep POLL_INTERVAL at 5 seconds or higher. Faster polling adds API traffic and does not make Flipkart extraction finish faster.

Set limit_per_input based on the number of reviews you need per SKU. The sample uses 1 so you can test the request and inspect one row before running a larger job.

For scheduled jobs, treat the script as a starting point. Move API_KEY, BASE_URL, batch size, and output paths into environment variables or your job configuration.

Step 3. Read the JSON output

A successful result looks like this:

[
  {
    "inputs": {
      "sku": "itm5309cefa035da"
    },
    "scrape_status": "success",
    "rootdomain": "flipkart.com",
    "sku": "itm5309cefa035da",
    "variant_sku": null,
    "review_id": "2d8f0692-24bf-4030-afb5-e1ff620d7090",
    "author": "Santu  Roy",
    "rating": "4",
    "date": "2026-04-13T00:00:00.000Z",
    "purchased_date": null,
    "location": "Birpara Tea Garden, West Bengal",
    "attributes": "Verified Buyer",
    "title": "Wonderful",
    "text": "Good product 👌",
    "product_name": "KADENCE KAD-BB01-NAT(with Online Classes) Acoustic Guitar Hard Wood Rosewood Right Hand Orientation",
    "recommended_review": null,
    "product_has_been_tried": null,
    "is_verified_purchase": true,
    "is_verified_buyer": true,
    "is_incentivized": null,
    "sold_and_shipped_by": null,
    "helpful_count": {
      "up": 0,
      "down": null
    },
    "brand_response": null,
    "syndicated": null,
    "program": null,
    "link": "https://flipkart.com/reviews/ACGGX9M23RZ2UPZE:146?reviewId=2d8f0692-24bf-4030-afb5-e1ff620d7090",
    "review_images_url": [],
    "seller_id": null,
    "variant_attributes": null,
    "url": "https://flipkart.com/",
    "timestamp": "2026-05-13T09:30:46.910Z",
    "input": {
      "url": "https://flipkart.com/",
      "sku": "itm5309cefa035da"
    },
    "discovery_input": {
      "sku": "itm5309cefa035da"
    },
    "scrape_error": null,
    "scrape_error_code": null
  }
]

Store the full raw row for audit and debugging. Write a normalized row into analytics tables after type conversion, deduplication, and error checks.

Keep raw JSON in object storage or a raw warehouse table. Raw rows help when Flipkart changes a field, your downstream schema changes, or analysts ask why a metric moved.

A raw row also preserves fields you do not model today. That matters when a product team asks for review images or badge history six months later.

What data you get back

Flipkart Reviews Scraper output schema
Flipkart Reviews Scraper output fields grouped by category.

Each result row is one Flipkart review. The API returns normalized fields, so your code does not parse review cards, star icons, badges, or dates from the page.

Field Type What it means
sku string Flipkart product identifier used as the input
review_id string Stable review identifier from the review URL
author string Reviewer display name
rating string Rating value, usually 1 through 5
date ISO string Review date in UTC format
location string or null Reviewer location shown by Flipkart
attributes string or null Review badges, for example Verified Buyer
title string Review headline
text string Full review body
product_name string Product name attached to the review
is_verified_purchase boolean or null Purchase verification flag
is_verified_buyer boolean or null Buyer verification flag
helpful_count.up number or null Upvotes on the review
helpful_count.down number or null Downvotes on the review
review_images_url array Image URLs attached to the review
link string Direct review URL
scrape_status string success for completed rows
scrape_error_code string or null Error code if extraction failed

Ready to get this data? Extract Flipkart reviews by SKU.

Use review_id as your primary key. It is safer than author + date + text because names repeat, text changes, and dates are not unique.

For product monitoring, join review rows back to product rows by sku. The Flipkart products scraper guide covers product extraction when your pipeline needs product metadata before reviews.

Keep both sku and product_name in your warehouse. Product names change, and the SKU gives you the stable join key.

Treat nullable fields as expected output. Flipkart does not show every badge, image, location, seller response, or verification marker on every review.

Production tips

Validate SKUs before you send jobs

Invalid SKU inputs waste credits and add unnecessary retry entries. Reject any value that does not start with itm.

import re

SKU_RE = re.compile(r"^itm[a-zA-Z0-9]+$")

def validate_sku(sku: str) -> str:
    sku = sku.strip()

    if not SKU_RE.match(sku):
        raise ValueError(f"Invalid Flipkart SKU: {sku}")

    return sku


raw_skus = [
    "itm5309cefa035da",
    "https://www.flipkart.com/some-product/p/itm5309cefa035da",
    "5309cefa035da"
]

valid_skus = []

for sku in raw_skus:
    try:
        valid_skus.append(validate_sku(sku))
    except ValueError as exc:
        print(exc)

print(valid_skus)

Expected output:

["itm5309cefa035da"]

If your source data contains full Flipkart URLs, extract the SKU before validation. This keeps the validation function strict and keeps URL parsing in one place.

from urllib.parse import urlparse

def extract_sku_from_flipkart_url(url: str) -> str:
    path = urlparse(url).path
    parts = [part for part in path.split("/") if part]

    for part in reversed(parts):
        if part.startswith("itm"):
            return part

    raise ValueError(f"No SKU found in URL: {url}")


url = "https://www.flipkart.com/example-product/p/itm5309cefa035da?pid=ACGGX9M23RZ2UPZE"
print(extract_sku_from_flipkart_url(url))

Expected output:

itm5309cefa035da

Add this check before you create API jobs.

Deduplicate by review ID

Run review scraping on a schedule and store new rows only. The right key is review_id.

def dedupe_reviews(reviews: list[dict]) -> list[dict]:
    seen = set()
    output = []

    for review in reviews:
        review_id = review.get("review_id")

        if not review_id:
            continue

        if review_id in seen:
            continue

        seen.add(review_id)
        output.append(review)

    return output

For warehouse tables, use a unique index on review_id. If you track multiple marketplaces in the same table, use (rootdomain, review_id).

Use a warehouse upsert keyed by review_id for cleaner scheduled runs.

Store ratings as integers

The response returns rating as a string. Cast it before writing analytics tables.

def normalize_review(row: dict) -> dict:
    return {
        "rootdomain": row.get("rootdomain"),
        "sku": row.get("sku"),
        "review_id": row.get("review_id"),
        "author": row.get("author"),
        "rating": int(row["rating"]) if row.get("rating") else None,
        "date": row.get("date"),
        "title": row.get("title"),
        "text": row.get("text"),
        "product_name": row.get("product_name"),
        "is_verified_buyer": row.get("is_verified_buyer"),
        "helpful_up": (row.get("helpful_count") or {}).get("up"),
        "helpful_down": (row.get("helpful_count") or {}).get("down"),
        "link": row.get("link"),
        "scrape_timestamp": row.get("timestamp"),
    }

A compact normalized schema works well:

Column Suggested type
rootdomain text
sku text
review_id text unique
author text
rating integer
date timestamp
title text
text text
product_name text
is_verified_buyer boolean
helpful_up integer
helpful_down integer
link text
scrape_timestamp timestamp

Keep date and scrape_timestamp separate. The review date tells you when the customer posted the review.

The scrape timestamp tells you when your system observed the review. That difference matters when you backfill old products or compare daily snapshots.

Use integer ratings for averages, distribution charts, and alert rules. Keep the raw rating value in your raw table if you need replayable transformations.

Handle row-level scrape errors

Treat job success and row success as separate checks. A job can complete while individual inputs return scrape_error_code.

def split_success_and_errors(rows: list[dict]) -> tuple[list[dict], list[dict]]:
    success = []
    errors = []

    for row in rows:
        if row.get("scrape_status") == "success" and not row.get("scrape_error_code"):
            success.append(row)
        else:
            errors.append({
                "input": row.get("input") or row.get("inputs"),
                "scrape_status": row.get("scrape_status"),
                "scrape_error": row.get("scrape_error"),
                "scrape_error_code": row.get("scrape_error_code"),
            })

    return success, errors

Retry only the failed SKUs, and cap retries at 2 attempts. Write failed inputs to a separate table with the input, job ID, and error code.

Batch inputs by product list

The sample sends 1 SKU with limit_per_input set to 1. In production, build inputs from product or search pipelines and send a list.

skus = [
    "itm5309cefa035da",
    "itm123456789abcd",
    "itmabcdef1234567"
]

SCRAPER_INPUTS = [{"sku": sku} for sku in skus]

If your SKU source starts from search pages, use the Search Flipkart results scraper to build the product list first. Browse all 86+ scrapers to see all available Flipkart scrapers.

Set a batch size that matches your downstream retry model, such as one batch per category or brand.

Monitor review volume changes

Review count changes tell you when a product gains traction or receives negative attention. Store daily counts by sku, then compare today’s count with the previous scrape.

A small table with sku, scrape_date, and review_count is enough for most teams. Add rating averages after you normalize the rating field to an integer.

For QA pipelines, alert when a product receives multiple 1-star reviews in a short period. For catalog monitoring, alert when a product stops returning reviews after previous successful runs.

Track rating distribution as buckets from 1 to 5. Averages hide movement when both 1-star and 5-star counts rise at the same time.

A basic monitoring table looks like this:

Column Suggested type Use
sku text Product join key
scrape_date date Daily snapshot date
review_count integer Total reviews returned
avg_rating numeric Mean rating after casting
one_star_count integer QA alert input
five_star_count integer Positive trend input

This table gives analysts a stable daily series. It also keeps your alert code away from raw scraper output.

Use the daily table for alerting, not raw review rows. Raw review rows work for audit and debugging, while aggregates work for monitoring.

Keep review ingestion idempotent

Scheduled review scraping reruns the same SKU many times. Your write path should handle duplicate rows without manual cleanup.

Use an upsert keyed by review_id or (rootdomain, review_id). Update mutable fields such as helpful_up, helpful_down, scrape_timestamp, and product_name.

Avoid delete-and-reload for review tables. It creates gaps in dashboards when a scheduled run fails halfway through the product list.

A simple warehouse pattern looks like this:

  1. Load raw scraper output into flipkart_review_raw.
  2. Split successful rows and failed rows.
  3. Normalize successful rows into a staging table.
  4. Upsert staging rows into flipkart_review.
  5. Insert failed rows into flipkart_review_error.

This pattern keeps reruns safe. It also gives your team a clear place to inspect failures without mixing them into analytics rows.

Pick a review limit per use case

limit_per_input controls how many review rows the API returns for each SKU. Use a small value for smoke tests and a larger value for production backfills.

Use case Suggested limit_per_input
API smoke test 1
Daily monitoring 25 to 100
New product backfill 500 or higher
Sentiment dataset build Match your dataset target

The right limit depends on how many SKUs you run and how much history you need. A daily monitor usually needs recent reviews, while a sentiment dataset often needs a larger sample.

Start with 1 row while testing credentials, schema, and storage. Increase the limit after your pipeline writes raw rows and normalized rows correctly.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Related articles

View all

Start collecting data in under five minutes.

Free credits included - no credit card required.

Free credits included - no credit card required