Zillow listings scraper for property search data

One Zillow output row costs one credit, starting at $0.04 and dropping to $0.012 at volume.

This Zillow listings scraper extracts price, beds, baths, address, ZPID, coordinates, Zestimate, rent Zestimate, lot size, listing status, and description. Use it to pull one property by URL or run filtered Zillow searches through the API.

ScrapeNow handles Zillow selectors, browser sessions, retries, and response parsing. Your application sends a property URL or search filters, then stores structured JSON.

How to use this scraper

Zillow Listings Scraper scraper selection — How the Zillow Listings Scraper routes each input type to the right scraper.

Use Extract Zillow listing data when you already have a property URL. Use Zillow Listings Search by Filters when your application stores search criteria as fields.

The filters endpoint accepts location, listing category, home type, price range, beds, baths, and days-on-Zillow values. That makes it a better fit for scheduled jobs, saved searches, and lead generation queues.

Use the URL extractor for known properties. Use filtered search for lead lists, market scans, rental tracking, and monitoring jobs that run daily or weekly.

Pick the right Zillow listings scraper

Scraper	Input type	Use it for	Credit math
Extract Zillow listing data	Property URL	Pull one known listing	1 row costs 1 credit
Zillow Listings Search by Filters	Structured filters	Search Zillow without building URLs	Each returned listing costs 1 credit
Zillow Listings Search by URL	Search results URL	Re-run a Zillow search URL copied from the browser	Each returned listing costs 1 credit
Get Zillow property details	Property URL	Pull deeper property detail fields	1 row costs 1 credit

ScrapeNow pre-built scrapers return 1 row per credit. Pricing starts at $0.04 per credit for 1 to 250 credits and drops to $0.012 per credit at 100K+ credits.

The listing endpoints cover fields needed for search results, comparisons, maps, and monitoring. The property details endpoint goes deeper on one property when your pipeline needs more than listing-level data.

A 5,000-row search export costs 5,000 credits. At the entry tier, that is $200. At the 100K+ tier, the same row count uses $60 of credit balance.

Get the property URL input

The url input is the Zillow property listing URL. It must start with https://www.zillow.com/.

Zillow homepage entering New York, NY homes in location search box — Zillow property URL input variable in ScrapeNow

Open zillow.com.

In the search bar, type a location or keyword like New York, NY homes.

Choose a listing from the card table on the right and click it.

Zillow New York homes for sale map beside property listing cards — Zillow property listing page opened from search results

Copy the URL from the address bar.

Zillow 338 E 69th St townhouse page URL highlighted — Zillow property URL copied from the browser address bar

Use that URL in `SCRAPER_INPUTS`.

Use the canonical property page URL when the browser gives you one. A URL under /homedetails/ gives the scraper a clean input and avoids search-map state parameters.

Search result URLs often contain viewport, map zoom, pagination, and filter state. Property URLs give your pipeline a stable target for one listing.

Run the API request

Install requests if your environment does not already have it.

pip install requests

Use this API script as the starting point.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "zillow-listings-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.zillow.com/homedetails/1420-Moraga-Dr-Los-Angeles-CA-90049/20530504_zpid/"
    }
]

BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    """Build headers using your API key."""
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    """POST to the scrape endpoint and return the job_id."""
    url = f"{BASE_URL}/scrape?scraper={slug}"

    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    """Poll the job status until it reaches a terminal state."""
    start = time.time()
    i = 0

    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)

        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()

        data = response.json()
        status = data["data"]["status"]

        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  ")
        sys.stdout.flush()

        if status in ("completed", "failed"):
            print()
            return status

        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    """Download the completed job results as JSON."""
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, filename: str) -> str:
    """Write results to a JSON file and return the filename."""
    os.makedirs(os.path.dirname(filename) or ".", exist_ok=True)

    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2)

    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")

    final_status = poll_until_done(job_id)

    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)

    print("Fetching results...")
    results = fetch_results(job_id)

    output_path = os.path.join("output", f"{SCRAPER_SLUG}.json")
    output_file = save_results(results, output_path)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The same API pattern works for the other scrapers in this group. That includes zillow-listings-search-by-filters and zillow-listings-search-by-url.

Change the SCRAPER_SLUG and SCRAPER_INPUTS values for each scraper. Keep the polling and result-download code unchanged unless your pipeline needs different timeout rules.

For batch jobs, send multiple input objects in SCRAPER_INPUTS. Each input produces one or more result rows, based on the scraper and Zillow results.

For URL extraction, one input usually produces one row. For search scrapers, one input can produce many listing rows.

Run a search by filters

Use zillow-listings-search-by-filters when your input comes from a form, database row, or scheduled job. The scraper searches Zillow directly with your criteria and returns matching listings.

The input variables are:

location, location to search Zillow in, such as New York, NY or San Francisco, CA
listingCategory, listing type to search for, with options such as House for sale, House for rent, and Sold
HomeType, home type filter, such as Apartments, Houses, Condos, and Townhomes
days_on_zillow, recency filter, with options such as 1 day, 7 days, 14 days, 30 days, 90 days, 6 months, 12 months, 24 months, and 36 months
minPrice, optional minimum listing price as an integer
maxPrice, optional maximum listing price as an integer
beds_min, optional minimum bedrooms as an integer
baths_min, optional minimum bathrooms as an integer

For API usage, pass the exact dropdown text as a string for listingCategory, HomeType, and days_on_zillow. Treat those fields as enums in your application.

To confirm available filter values, open zillow.com, use the search and filter controls, and copy the exact text shown in each dropdown. Small text differences can make a request fail validation.

A typical payload looks like this.

[
  {
    "location": "New York, NY",
    "listingCategory": "House for sale",
    "HomeType": "Condos",
    "days_on_zillow": "7 days",
    "minPrice": 500000,
    "maxPrice": 1500000,
    "beds_min": 2,
    "baths_min": 2
  }
]

This scraper fits scheduled searches. For example, run it every morning for House for sale, Condos, 7 days, and your target ZIP codes.

Use the filters endpoint when your application controls the query. It keeps search inputs typed, reviewable, and easy to diff in code reviews.

The filters endpoint also avoids storing long Zillow URLs full of browser state. Your job record stores clear fields like location, maxPrice, and beds_min.

Run a search by URL

Use zillow-listings-search-by-url when someone already built the Zillow search in the browser. Copy the full Zillow search results URL and pass it as the scraper input.

Zillow homepage searching New York record studios with continue searching listings — Zillow search URL input variable in ScrapeNow

Open zillow.com and run the search.

Zillow New York NY homes for sale results page with filter dropdowns — Zillow search results page in the browser

Copy the search URL from the address bar after the page loads.

Zillow filtered New York results with search URL highlighted — Zillow search URL copied from the browser address bar

Use this mode when product, operations, or research users create searches manually and developers run them through the API. It preserves complex browser-created filters that users do not want to rebuild as structured API fields.

Save the exact URL alongside the job ID. That gives you a reproducible input when a requester asks why a result set changed.

Search-by-URL is faster to adopt for internal teams. Search-by-filters is easier to validate, test, and schedule from application code.

What data you get back

Zillow Listings Scraper output schema — Zillow Listings Scraper output fields grouped by category.

The response is JSON. Each result contains the original input, scrape status, listing identifiers, address fields, pricing fields, property attributes, coordinates, estimates, tax fields, and listing metadata.

A trimmed response looks like this.

[
  {
    "inputs": {
      "url": "https://www.zillow.com/homedetails/1420-Moraga-Dr-Los-Angeles-CA-90049/20530504_zpid/"
    },
    "scrape_status": "success",
    "zpid": 20530504,
    "city": "Los Angeles",
    "state": "CA",
    "homeStatus": "FOR_SALE",
    "address": {
      "city": "Los Angeles",
      "streetAddress": "1420 Moraga Dr",
      "zipcode": "90049",
      "state": "CA"
    },
    "bedrooms": 6,
    "bathrooms": 8,
    "price": 12495000,
    "yearBuilt": 1982,
    "streetAddress": "1420 Moraga Dr",
    "zipcode": "90049",
    "isVerifiedClaimedByCurrentSignedInUser": "No",
    "listingDataSource": "Phoenix",
    "longitude": -118.46802,
    "latitude": 34.092167,
    "livingArea": 8938,
    "homeType": "SINGLE_FAMILY",
    "lotSize": 261442,
    "lotAreaValue": 6.0019,
    "lotAreaUnits": "Acres",
    "livingAreaValue": 8938,
    "isUndisclosedAddress": "false",
    "zestimate": 11290800,
    "rentZestimate": 40229,
    "currency": "USD",
    "dateSoldString": "2016-06-21",
    "taxAssessedValue": 8159593,
    "taxAssessedYear": 2025,
    "country": "USA",
    "propertyTaxRate": 1.18,
    "photoCount": 29,
    "isPremierBuilder": "false",
    "ssid": 17327,
    "hdpUrl": "https://www.zillow.com/homedetails/1420-Moraga-Dr-Los-Angeles-CA-90049/20530504_zpid/",
    "tourViewCount": 0,
    "lastSoldPrice": 7600000,
    "hasApprovedThirdPartyVirtualTourUrl": false,
    "zestimateLowPercent": "5",
    "zestimateHighPercent": "6",
    "description": "One of the largest homes within the exclusive gates of Moraga Estates... (truncated)"
  }
]

Fields that usually become database columns

Use zpid as the stable listing identifier. Zillow URLs change format, while zpid gives you a clean key for dedupe and updates.

Use price, bedrooms, bathrooms, livingArea, lotAreaValue, lotAreaUnits, and homeType for property comparison. Keep currency with price because downstream systems should read the currency from the row.

Use latitude and longitude for map views, clustering, and distance calculations. Store both as numeric fields.

Use zestimate, rentZestimate, taxAssessedValue, taxAssessedYear, propertyTaxRate, lastSoldPrice, and dateSoldString as optional fields. These fields are missing on some listings, so treat them as nullable.

Store scrape_status with every row. This lets your loader separate successful records from failed inputs before writing to production tables.

Keep inputs in your raw table. When a listing changes or disappears, the original input shows which URL or filter produced the row.

Keep hdpUrl as a secondary reference. It helps analysts open the same Zillow page from internal tools.

Field coverage by endpoint

The three listing scrapers return the same listing-level shape once they reach a property result. Their main difference is input control.

Endpoint	Best input source	Typical output shape
Extract by URL	Known property URL	One listing record with price, address, attributes, estimates, and metadata
Search by Filters	App-owned filters	Multiple listing records from a typed Zillow search
Search by URL	Browser-created search URL	Multiple listing records from a copied Zillow search page

Ready to get this data? Try the Zillow scraper with your own URLs.

Use property details extraction when a downstream workflow needs deeper fields for one known property. Use listing extraction when the job needs search coverage, monitoring, or a row-per-listing feed.

This split matters in production. Search jobs create candidate lists, and detail jobs enrich selected records after dedupe.

Production tips

The Zillow Listings Scraper job pipeline, from input to stored output.

Validate inputs before creating jobs

Reject URLs that do not start with https://www.zillow.com/. This catches invalid rows before they spend credits.

def validate_zillow_url(url: str) -> None:
    if not isinstance(url, str):
        raise TypeError("url must be a string")

    if not url.startswith("https://www.zillow.com/"):
        raise ValueError("url must start with https://www.zillow.com/")

    if "/homedetails/" not in url:
        raise ValueError("url must be a Zillow property details URL")


validate_zillow_url(
    "https://www.zillow.com/homedetails/1420-Moraga-Dr-Los-Angeles-CA-90049/20530504_zpid/"
)

For filtered search jobs, validate enums before sending the request. Keep the accepted values in code so invalid UI values fail before the API call.

VALID_LISTING_CATEGORIES = {"House for sale", "House for rent", "Sold"}
VALID_HOME_TYPES = {"Apartments", "Houses", "Condos", "Townhomes"}
VALID_DAYS_ON_ZILLOW = {
    "1 day",
    "7 days",
    "14 days",
    "30 days",
    "90 days",
    "6 months",
    "12 months",
    "24 months",
    "36 months",
}


def validate_filter_inputs(payload: dict) -> None:
    if payload["listingCategory"] not in VALID_LISTING_CATEGORIES:
        raise ValueError("invalid listingCategory")

    if payload["HomeType"] not in VALID_HOME_TYPES:
        raise ValueError("invalid HomeType")

    if payload["days_on_zillow"] not in VALID_DAYS_ON_ZILLOW:
        raise ValueError("invalid days_on_zillow")

    for key in ("minPrice", "maxPrice", "beds_min", "baths_min"):
        if key in payload and not isinstance(payload[key], int):
            raise TypeError(f"{key} must be an integer")


validate_filter_inputs({
    "location": "New York, NY",
    "listingCategory": "House for sale",
    "HomeType": "Condos",
    "days_on_zillow": "7 days",
    "minPrice": 500000,
    "maxPrice": 1500000,
    "beds_min": 2,
    "baths_min": 2
})

Add range checks for prices and bed counts in your application. A typo such as 50000000 instead of 500000 can produce a valid request with useless results.

Validate location as well. Empty locations and internal test strings waste credits because Zillow receives a search request that your team never intended to run.

Deduplicate on ZPID

Use zpid as the primary dedupe key. Fall back to hdpUrl only when a row has no zpid.

def dedupe_listings(rows: list[dict]) -> list[dict]:
    seen = set()
    deduped = []

    for row in rows:
        key = row.get("zpid") or row.get("hdpUrl")

        if not key:
            continue

        if key in seen:
            continue

        seen.add(key)
        deduped.append(row)

    return deduped

This matters when you combine search jobs. The same listing can appear in nearby city searches, broad county searches, ZIP searches, and price-band searches.

Run dedupe before writes to your main listing table. Store duplicates in a separate audit table if you need source attribution for each search.

Keep the dedupe step after result fetch and before normalization. That keeps the raw row intact while you decide which record becomes the current version.

Normalize the schema before loading

Zillow fields mix integers, floats, strings, booleans, and nested objects. Normalize the fields you query often, then store the full raw JSON for replay and audits.

def normalize_listing(row: dict) -> dict:
    address = row.get("address") or {}

    return {
        "zpid": row.get("zpid"),
        "scrape_status": row.get("scrape_status"),
        "home_status": row.get("homeStatus"),
        "street_address": row.get("streetAddress") or address.get("streetAddress"),
        "city": row.get("city") or address.get("city"),
        "state": row.get("state") or address.get("state"),
        "zipcode": row.get("zipcode") or address.get("zipcode"),
        "price": row.get("price"),
        "currency": row.get("currency"),
        "bedrooms": row.get("bedrooms"),
        "bathrooms": row.get("bathrooms"),
        "living_area": row.get("livingArea"),
        "home_type": row.get("homeType"),
        "latitude": row.get("latitude"),
        "longitude": row.get("longitude"),
        "zestimate": row.get("zestimate"),
        "rent_zestimate": row.get("rentZestimate"),
        "hdp_url": row.get("hdpUrl"),
        "raw": row
    }

Do not cast nullable fields blindly. A missing rentZestimate should stay None, since 0 means a real numeric value in many warehouses.

Use explicit types in your destination schema. Store prices and counts as integers, coordinates as decimal or double fields, and descriptions as text.

Keep raw booleans as booleans when the source returns them that way. Convert string flags only when your warehouse schema requires strict boolean columns.

Handle failed rows separately

Check scrape_status per row. A completed job can still contain rows that failed input validation or were unavailable at scrape time.

def split_success_and_failed(rows: list[dict]) -> tuple[list[dict], list[dict]]:
    success = []
    failed = []

    for row in rows:
        if row.get("scrape_status") == "success":
            success.append(row)
        else:
            failed.append(row)

    return success, failed

Retry failed rows in a separate job. Cap retries at 2 attempts so invalid URLs do not loop forever.

Log the original input with each failed row. That makes cleanup faster when a property was removed, redirected, or entered with the wrong domain.

Add a failure reason column if your loader stores error details. It saves time when support asks whether a failure came from input validation or source availability.

Store job metadata

Store the ScrapeNow job ID, scraper slug, input payload, start time, final status, and output file location. This gives you a clean trail when someone asks why a listing changed.

The sample script polls every 5 seconds and times out after 3600 seconds. For scheduled pipelines, keep those values configurable per job type.

For nightly searches, a longer timeout is fine. For a user-facing workflow, fail sooner and show the job ID so support can inspect it later.

A basic metadata table should include:

job_id
scraper_slug
input_hash
input_payload
started_at
finished_at
final_status
result_count
failed_count
output_path

Use an input_hash to detect accidental duplicate jobs. Hash the normalized input JSON so key order does not create different hashes for the same request.

import hashlib
import json


def input_hash(payload: dict | list[dict]) -> str:
    normalized = json.dumps(payload, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(normalized.encode("utf-8")).hexdigest()

Store the hash before you create the job. Then your queue can reject duplicate work before it reaches the scraper API.

Keep raw and normalized data

Store normalized columns for queries and dashboards. Store raw JSON for debugging, reprocessing, and field backfills.

This pattern saves time when you add a new column later. You can backfill from raw records without running the same Zillow jobs again.

For example, you can add photoCount or tourViewCount to your warehouse after the first load. The raw row already contains those values when Zillow returned them.

Keep raw payloads in object storage if your warehouse charges heavily for semi-structured columns. Store the object path in your normalized table.

Treat listings as changing records

Zillow listing data changes over time. Price, status, photos, descriptions, and estimates can change between runs.

Keep a current table and a history table if you track market changes. The current table stores the latest row by zpid, and the history table stores each observed version with a scrape timestamp.

This structure makes price-drop detection straightforward. Compare the latest price with the previous price for the same zpid.

A minimal current table key is zpid. A minimal history table key is (zpid, scraped_at).

select
  current.zpid,
  previous.price as previous_price,
  current.price as current_price,
  previous.price - current.price as price_drop
from zillow_current current
join zillow_history previous
  on previous.zpid = current.zpid
where previous.scraped_at = (
  select max(h.scraped_at)
  from zillow_history h
  where h.zpid = current.zpid
    and h.scraped_at < current.scraped_at
)
and current.price < previous.price;

Run that check after each scheduled load. Send the output to your CRM, alerting system, or analyst queue.

Keep search inputs small enough to review

Large search jobs produce useful coverage, and they also make debugging harder. Split broad regions into ZIP codes, cities, or price bands that your team can inspect.

A search for all homes in a major metro area gives you a large output file. A set of smaller jobs gives you clearer retry behavior and better source attribution.

Smaller jobs also make cost review simpler. Each job record shows how many rows came from one location, filter set, or saved search.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.