How to extract Flipkart product data by URL

Flipkart product pages contain title, brand, category, images, pricing, seller details, ratings, and stock status. The ScrapeNow Flipkart products scraper extracts all of these from a product URL and returns clean JSON for SKU monitoring, catalog enrichment, and price tracking.

Use this scraper when you already have Flipkart product URLs. It works well for scheduled monitoring jobs, internal catalog joins, and pipelines that need the current state of a known product page.

How to use this scraper

The Flipkart Products Scraper job pipeline, from input to stored output.

The Extract Flipkart product data scraper takes one input field.

Input	Type	Required	Rule
`url`	string	yes	Must start with `https://www.flipkart.com/`

Use URL extraction when you already have product pages and need full product detail. If you need product discovery from categories first, run the Flipkart Products Search by Category scraper and pass the returned product URLs into this scraper.

This two-step flow keeps discovery separate from detail extraction. Category pages change often, while product pages hold the fields your warehouse usually needs.

Step 1. Find the product URL

Open flipkart.com/search.

Flipkart search dropdown suggesting violin instruments under 1000 — Search for your product keyword on Flipkart to find listings

Search for a keyword or seller in the search bar. For example, search for `violin`.

Flipkart violin search results grid with Kadence violin highlighted — Browse the results and click the product you want to extract

Open the product you want from the search results. Copy the full URL from the browser address bar.

Flipkart Kadence Vivaldi 4/4 violin product page with URL highlighted — Copy the full product URL from the address bar for scraper input

The copied value goes into `SCRAPER_INPUTS[0]["url"]`. Keep the full URL, including the query string, when you are testing.

Flipkart URLs often include pid, lid, marketplace, and tracking parameters. The scraper returns the resolved product URL in the output, so you can store both input and resolved URLs.

Step 2. Run the API job

Use this Python script. Replace YOUR_API_KEY with your ScrapeNow API key.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "flipkart-products-unified-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.flipkart.com/yamaha-f280tbs-in-acoustic-guitar-rosewood-right-hand-orientation/p/itm0b10981c8d907?pid=ACGGF22AGSWKRK4N&lid=LSTACGGF22AGSWKRK4NYFFBL9&hl_lid=&marketplace=FLIPKART&fm=eyJ3dHAiOiJyZWNvIiwicHJwdCI6ImNscCIsIm1pZCI6InBlcnNvbmFsaXNlZFJlY29tbWVuZGF0aW9uL3AycC1zYW1lIn0%3D&pageUID=1778510618204"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

This script starts a job, polls every 5 seconds, waits up to 3600 seconds, downloads JSON, and writes the result to disk. It also creates the output directory before writing the file.

For production, pass inputs from a file or database instead of hardcoding them. Keep the polling interval stable unless your scheduler has strict runtime limits.

Step 3. Read the JSON output

A successful run returns an array of product records. This trimmed response comes from the URL in the script above.

[
  {
    "inputs": {
      "url": "https://www.flipkart.com/yamaha-f280tbs-in-acoustic-guitar-rosewood-right-hand-orientation/p/itm0b10981c8d907?pid=ACGGF22AGSWKRK4N&lid=LSTACGGF22AGSWKRK4NYFFBL9&hl_lid=&marketplace=FLIPKART&fm=eyJ3dHAiOiJyZWNvIiwicHJwdCI6ImNscCIsIm1pZCI6InBlcnNvbmFsaXNlZFJlY29tbWVuZGF0aW9uL3AycC1zYW1lIn0%3D&pageUID=1778510618204"
    },
    "scrape_status": "success",
    "url": "https://www.flipkart.com/yamaha-f280tbs-in-acoustic-guitar-rosewood-right-hand-orientation/p/itm0b10981c8d907?pid=ACGGF22AGSWKRK4N&lid=LSTACGGF22AGSWKRK4NYFFBL9&hl_lid=&marketplace=FLIPKART&fm=eyJ3dHAiOiJyZWNvIiwicHJwdCI6ImNscCIsIm1pZCI6InBlcnNvbmFsaXNlZFJlY29tbWVuZGF0aW9uL3AycC1zYW1lIn0%3D&pageUID=1778510618204",
    "item_id": "ACGGF22AGSWKRK4N",
    "variant_id": "ACGGF22AGSWKRK4N",
    "title": "YAMAHA F280TBS//IN Acoustic Guitar Rosewood Rosewood Right Hand Orientation",
    "description": "Buy YAMAHA F280TBS//IN Acoustic Guitar Rosewood Rosewood Right Hand Orientation for Rs.7990.0 online. YAMAHA F280TBS//IN Acoustic Guitar Rosewood Rosewood Right Hand Orientation at best prices with FREE shipping & cash on delivery. Only Genuine Products. 30 Day Replacement Guarantee.",
    "product_category": "Musical Instruments > String Instruments > Acoustic Guitars > YAMAHA Acoustic Guitars > See other sellers",
    "category_tree": [
      {
        "name": "Musical Instruments",
        "url": "https://www.flipkart.com/musical-instruments/pr?sid=ypu&marketplace=FLIPKART"
      },
      {
        "name": "String Instruments",
        "url": "https://www.flipkart.com/musical-instruments/string-instruments/pr?sid=ypu,ujd&marketplace=FLIPKART"
      },
      {
        "name": "Acoustic Guitars",
        "url": "https://www.flipkart.com/musical-instruments/string-instruments/acoustic-guitars/pr?sid=ypu,ujd,7wz&marketplace=FLIPKART"
      },
      {
        "name": "YAMAHA Acoustic Guitars",
        "url": "https://www.flipkart.com/musical-instruments/string-instruments/acoustic-guitars/yamaha~brand/pr?sid=ypu,ujd,7wz&marketplace=FLIPKART"
      },
      {
        "name": "See other sellers",
        "url": "https://www.flipkart.com/sellers?pid=ACGGF22AGSWKRK4N"
      }
    ],
    "brand": "YAMAHA",
    "image_url": "https://rukmini1.flixcart.com/image/1500/1500/xif0q/acoustic-guitar/e/w/j/-original-imahffkzqecs5egh.jpeg?q=70",
    "price": "₹7,990.00",
    "sale_price": "₹7,191.00",
    "availability": "in_stock",
    "availability_date": "2026-05-13",
    "group_id": "ACGGF22AGSWKRK4N",
    "listing_has_variations": true,
    "variant_attributes": [
      {
        "name": "color",
        "value": "Brown"
      }
    ],
    "variants": [
      {
        "variant_type": "Color",
        "variant_options": [
          {
            "option_id": "ACGG2A3MAMCWUB9W",
            "option_name": "Beige",
            "option_price": null,
            "in_stock": true,
            "image": null
          },
          {
            "option_id": "ACGGF22AGSWKRK4N",
   
... (truncated)

Keep one raw copy of every response row. Typed columns make queries faster, and raw JSON protects you when Flipkart changes page markup or adds a field.

What data you get back, key API response fields

Flipkart Products Scraper output schema — Flipkart Products Scraper output fields grouped by category.

inputs.url is the original URL you sent. Keep it in your warehouse because it gives you a stable join key between your input batch and ScrapeNow output.

scrape_status tells you whether extraction worked for that row. Treat success as loadable data and route every other status to a retry queue or review table.

url is the resolved Flipkart product URL returned by the scraper. Store it separately from inputs.url if your pipeline tracks redirects, canonical URLs, or URL cleanup.

item_id is the product identifier from Flipkart. In the sample, item_id is ACGGF22AGSWKRK4N.

variant_id identifies the selected variant. For products with color, size, storage, pack size, or model changes, the variant ID tracks the exact listing state.

title is the product title shown on the product page. Use it for matching, search indexing, catalog display, and manual review screens.

description contains the product description text. In the sample, it includes the product name, listed price, shipping text, cash-on-delivery text, and replacement guarantee copy.

product_category is a flattened category path. The sample value is Musical Instruments > String Instruments > Acoustic Guitars > YAMAHA Acoustic Guitars > See other sellers.

category_tree returns the same path as structured objects with name and url. Use this field when you build category tables or breadcrumb navigation.

brand is the detected brand. In the sample, the value is YAMAHA.

image_url is the primary product image. Download it later from your own worker if you store media, rather than doing that inside the scraper job flow.

price and sale_price are formatted strings with the currency symbol. Parse them into numeric columns before loading them into analytics, dashboards, or alerting jobs.

availability gives stock status. The sample returns in_stock.

availability_date gives the detected availability date when Flipkart exposes one. The sample value is 2026-05-13.

group_id groups variants for the same product family. Use it to connect color or size variants under one parent product.

listing_has_variations is a boolean flag. If it is true, read variant_attributes and variants.

variant_attributes describes the selected variant. The sample product has color set to Brown.

variants lists available variant groups and options. In the sample, the Color group includes options like Beige and Brown, each with an option_id, stock flag, and image field.

For review data on the same SKU, use the Extract Flipkart reviews by SKU scraper. That scraper uses product identifiers instead of product page URLs.

Field mapping for warehouses

Most teams need two tables. Store the latest state in one table, then store every scrape as history in another table.

Use item_id and variant_id as the natural product key. Use scraped_at as the time key for historical price and stock tracking.

A current-state table answers operational questions. A history table answers price movement, stockout duration, and catalog drift questions.

For example, a stock monitor reads the current-state table. A pricing analyst reads the history table and groups rows by item_id, variant_id, and date.

Ready to get this data? Extract Flipkart product data.

Choose the right Flipkart scraper for the job

Flipkart Products Scraper scraper selection — How the Flipkart Products Scraper routes each input type to the right scraper.

Use the URL extractor when product URLs already exist in your database. Use search and category scrapers when you need to discover products first.

Job	Scraper
Extract full detail from known product URLs	Extract Flipkart product data
Find products inside a category	Flipkart Products Search by Category
Extract products from a search results URL	Search Flipkart results
Search Flipkart by category input	Search Flipkart results
Collect reviews for a SKU	Extract Flipkart reviews by SKU

The full ScrapeNow catalog has 86+ pre-built scrapers across 14 platforms. Use the Browse all 86+ scrapers hub when you need the broader Flipkart scraper set or want to pair Flipkart extraction with Amazon, Google, LinkedIn, YouTube, Zillow, Indeed, or other sources.

A common flow starts with a category scraper, stores every discovered URL, and then runs URL extraction on a schedule. That keeps discovery frequency separate from detail refresh frequency.

For example, refresh categories weekly and refresh known product URLs daily. This cuts duplicate category crawling while keeping price and stock data current.

Production tips, validation, deduplication, schema, error handling

Validate input URLs before sending jobs

Reject invalid inputs before they hit the API. Flipkart product URLs must start with https://www.flipkart.com/.

from urllib.parse import urlparse

def validate_flipkart_url(url: str) -> None:
    parsed = urlparse(url)

    if parsed.scheme != "https":
        raise ValueError(f"Flipkart URL must use https: {url}")

    if parsed.netloc != "www.flipkart.com":
        raise ValueError(f"Flipkart URL must use www.flipkart.com: {url}")

    if "/p/" not in parsed.path:
        raise ValueError(f"URL does not look like a product page: {url}")


urls = [
    "https://www.flipkart.com/yamaha-f280tbs-in-acoustic-guitar-rosewood-right-hand-orientation/p/itm0b10981c8d907?pid=ACGGF22AGSWKRK4N"
]

for url in urls:
    validate_flipkart_url(url)

Run this validation where you build the batch, not after the API call.

Deduplicate by item ID and variant ID

For product detail tracking, dedupe on item_id plus variant_id. If variant_id is missing, fall back to item_id and normalized URL.

def dedupe_products(records: list[dict]) -> list[dict]:
    seen = set()
    output = []

    for record in records:
        item_id = record.get("item_id")
        variant_id = record.get("variant_id")
        url = record.get("url")

        key = (item_id, variant_id or url)

        if key in seen:
            continue

        seen.add(key)
        output.append(record)

    return output

Run dedupe after every batch. Search and category flows can rediscover the same product through multiple paths.

Parse money fields into integers or decimals

price and sale_price arrive as formatted strings like ₹7,990.00. Store raw text and parsed numeric values.

import re
from decimal import Decimal

def parse_inr(value: str | None) -> Decimal | None:
    if not value:
        return None

    cleaned = re.sub(r"[^\d.]", "", value)

    if cleaned == "":
        return None

    return Decimal(cleaned)


record = {
    "price": "₹7,990.00",
    "sale_price": "₹7,191.00"
}

record["price_amount"] = parse_inr(record["price"])
record["sale_price_amount"] = parse_inr(record["sale_price"])
record["currency"] = "INR"

Keep the currency as its own column. Do not infer it later from the symbol if you mix sources.

For analytics warehouses, decimals are safer than floats. A float can introduce small rounding differences that break exact price comparisons.

If you store paise, multiply by 100 after parsing and store an integer. That works well for alerting systems that compare exact values.

Normalize availability values

Stock fields drive alerts, so treat them as controlled values. Store the raw availability value and map it into your internal status list.

def normalize_availability(value: str | None) -> str:
    if value == "in_stock":
        return "available"

    if value == "out_of_stock":
        return "unavailable"

    if value is None:
        return "unknown"

    return "review"


record = {
    "availability": "in_stock"
}

record["availability_normalized"] = normalize_availability(record["availability"])

This gives dashboards a stable field even when source text changes. Keep the raw field for debugging and audit trails.

Use the normalized value for alerts. Use the raw value when you need to inspect a specific page result.

Use a stable warehouse schema

A practical table for product monitoring needs raw JSON plus typed columns. Raw JSON saves you when Flipkart adds a field and you want to backfill later.

Column	Type	Notes
`input_url`	text	From `inputs.url`
`resolved_url`	text	From `url`
`scrape_status`	text	`success`, `failed`, or other row state
`item_id`	text	Flipkart product ID
`variant_id`	text	Selected variant
`group_id`	text	Parent product group
`title`	text	Product title
`brand`	text	Product brand
`category_path`	text	From `product_category`
`price_amount`	numeric	Parsed from `price`
`sale_price_amount`	numeric	Parsed from `sale_price`
`currency`	text	Use `INR`
`availability`	text	Stock status
`availability_date`	date	If present
`listing_has_variations`	boolean	Variant flag
`raw_json`	jsonb	Full API row
`scraped_at`	timestamp	Your load time

Use item_id, variant_id, and scraped_at for history. Use item_id and variant_id without timestamp for the current-state table.

Add a unique constraint on the current-state table. That lets you upsert the latest row without creating duplicate product records.

For the history table, add an index on item_id, variant_id, and scraped_at. Price charts and stock timelines read faster with that access pattern.

Split success rows from retry rows

Do not treat job completion as row success. A completed job can still contain failed rows.

def split_results(records: list[dict]) -> tuple[list[dict], list[dict]]:
    good = []
    retry = []

    for record in records:
        if record.get("scrape_status") == "success":
            good.append(record)
        else:
            retry.append({
                "input": record.get("inputs"),
                "status": record.get("scrape_status"),
                "raw": record,
            })

    return good, retry

Retry failed rows in smaller batches. If the same URL fails 3 times, send it to a dead-letter table and move on.

Track input batches

Batch IDs make scraper output easier to audit. Add your own batch_id before loading results into the warehouse.

from datetime import datetime, timezone
from uuid import uuid4

batch_id = str(uuid4())
scraped_at = datetime.now(timezone.utc).isoformat()

for record in records:
    record["batch_id"] = batch_id
    record["scraped_at"] = scraped_at

Store the same batch_id on every row created from one API run for easier reruns and audits.

Common use cases

Price tracking

Run the URL extractor on a fixed schedule and store price_amount, sale_price_amount, and scraped_at. Compare the latest row against the previous row for the same item_id and variant_id.

Alert only on real changes after parsing the money fields. String comparisons fail when formatting changes but the numeric price stays the same.

Stock monitoring

Use availability, availability_date, and scraped_at to track inventory state. A product moving from in_stock to another state should update the current-state table and append a history row.

For reporting, calculate stockout duration from the history table. That gives buyers and catalog teams a defensible timeline.

Catalog enrichment

Use title, brand, category_tree, image_url, and description to enrich internal SKU records. Match against your own SKU table with item_id, variant_id, or a stored product URL.

Keep enrichment writes separate from price and stock writes. Catalog fields change less often, while price and stock change frequently.

Variant audits

Use group_id, variant_attributes, and variants to monitor available options. This works for color, size, storage, pack size, and model families.

Store variant options as nested JSON or split them into a child table. Use a child table if analysts need to query option-level stock.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Start with the Extract Flipkart product data scraper if you already have product URLs. If you need product discovery first, Browse all 86+ scrapers and pick the Flipkart search or category scraper before running URL extraction.

For a small test, run ten known product URLs and inspect the raw JSON before building the warehouse loader. Once the field mapping looks right, move the same script into your scheduler and add batch IDs, retry handling, and dedupe.