How to scrape LinkedIn job listings

LinkedIn job pages contain title, company, location, description, and posting metadata. The LinkedIn Jobs Extract by URL scraper returns normalized job records from LinkedIn job pages.

How to use this scraper

The LinkedIn Jobs Scraper job pipeline, from input to stored output.

ScrapeNow has three LinkedIn jobs scrapers in this group.

Scraper	Use it when	Input style
Pull structured LinkedIn job listings	You already have one or more LinkedIn job URLs	Job URL
Search LinkedIn jobs by keyword	You want to search jobs by title, company, location, filters, or radius	Keyword and filters
Search LinkedIn jobs by URL	You already built a LinkedIn Jobs search URL	Search URL

Use Extract by URL for records from known job posts. Use Search by Keyword for discovery across titles, companies, locations, and job filters.

A production flow usually uses both. Search by Keyword finds new job posts, Extract by URL refreshes known posts, and your database deduplicates records by job_posting_id.

Use Search by URL when a human already built the exact LinkedIn filter state. The search URL preserves LinkedIn filters, which keeps QA runs repeatable.

The fastest test path has four steps. Submit one known LinkedIn job URL, poll the job, download JSON, then verify job_posting_id and scrape_status.

Step 1. Get the job URL

The Extract by URL scraper takes one input field named url.

LinkedIn search suggesting Universal Music Group companies — Search the company on LinkedIn to reach its job postings

Open `linkedin.com`.

Universal Music Group company card in LinkedIn results — Type the company name or job keyword in the LinkedIn search bar

In the search bar, type a keyword or company name, such as `Universal Music Group`.

Universal Music Group LinkedIn page with the Jobs tab — Click the company card to open its LinkedIn profile page

On the results page, click the company card.

Recently posted jobs on the Universal Music Group page — Open the Jobs tab on the company page to find open positions

On the profile page, click the Jobs option in the navigation bar. Click the job position you want from the job list.

Click the arrow button in the top right corner of the job post. Then click Copy link.

Copying a LinkedIn job post link via the share menu — Click the share arrow and select Copy link to get the job post URL

The input should look like this:

[
  {
    "url": "https://www.linkedin.com/jobs/view/4387478109"
  }
]

Send one object per job URL. For a batch run, add more objects to the same inputs array.

[
  {
    "url": "https://www.linkedin.com/jobs/view/4387478109"
  },
  {
    "url": "https://www.linkedin.com/jobs/view/4387478110"
  }
]

Keep the original URL in your database. It gives your pipeline a stable audit trail when a job expires, redirects, or returns a partial record.

Store the LinkedIn URL exactly as your system received it. Tracking parameters help audits because they show which alert, search, or source created the request.

For large batches, keep an input file with one row per job URL. Add internal source fields, such as account_id, search_name, or requested_by, before you call the API.

Use one input object for each job post. If your queue stores 10,000 URLs, split them into batches that match your worker retry policy.

Step 2. Run the API request

Use this Python script with your ScrapeNow API key.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "linkedin-jobs-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.linkedin.com/jobs/view/4387478109"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script does four things:

Starts a scraping job with POST /scrape?scraper=linkedin-jobs-extract-by-url
Polls /jobs/{job_id} every 5 seconds
Downloads JSON results from /jobs/{job_id}/results?format=json
Saves the response to output/linkedin-jobs-extract-by-url.json

The timeout is set to 3600 seconds. Keep that value for bulk runs because LinkedIn response times vary by job page and geo.

The polling interval is set to 5 seconds. That interval keeps status traffic low and gives your worker quick feedback when a run finishes.

The same API pattern works for the other scrapers in this group. Change the scraper slug and input values in the code for each scraper.

For production workers, store the returned job_id before polling. If your process restarts, resume polling with the stored ID.

Treat the scrape job as asynchronous work. Submit once, persist the job_id, and let a worker poll until the API returns a terminal status.

Poll every 5 seconds in a production queue. A 1-second interval adds status traffic without improving batch throughput.

Log the request payload, job_id, final status, and result count. Those four values usually answer the first debugging question after a failed batch.

Step 3. Use keyword search when you need discovery

The keyword scraper takes search filters instead of a single job URL.

LinkedIn Jobs location panel set to New York — Set the location filter in LinkedIn Jobs before searching

The main inputs are:

location is the city, region, metro area, or location text used for the search.
keyword is the search term, such as Music Teacher.
country is optional. For API usage, send a 2-letter ISO 3166-1 country code, such as US.
date_posted controls how old posts can be, such as Past 24 hours.
employment_type filters full-time, part-time, temporary, and related job types.
experience_level filters jobs by required experience, such as Entry-Level.
work_location filters remote, on-site, or hybrid jobs.
company narrows results to one company name.
selective_search excludes jobs that do not contain the exact keyword in the title when set to true.
location_radius controls the radius around the chosen location, such as 5 miles (8km).

LinkedIn Jobs recent location suggestions dropdown — Set the location field to the city or metro area for your job search

Set `location` to the city, region, or metro area you want. Use the same spelling your team uses in downstream reports.

LinkedIn Jobs date posted filter options — Add a country code to disambiguate cities with the same name

Use `country` when the same location name exists in multiple countries. `Springfield` and `London` are common examples where country disambiguation cuts noisy results.

LinkedIn Jobs employment type filter options — Use ISO 3166-1 alpha-2 country codes like US, GB, or CA in the API

For API calls, pass the country as a string. Use `US`, `GB`, `CA`, `AU`, or another valid ISO 3166-1 alpha-2 code.

LinkedIn Jobs experience level filter options — Store allowed filter values in configuration for consistent job searches

Keep filter values exactly as the scraper expects them. Store allowed values in configuration rather than letting users type free-form labels.

LinkedIn Jobs company filter options — Combine keyword, location, and employment type for targeted results

A typical keyword input looks like this:

[
  {
    "keyword": "Music Teacher",
    "location": "Philadelphia, PA",
    "country": "US",
    "date_posted": "Past 24 hours",
    "employment_type": "Full-time",
    "experience_level": "Entry-Level",
    "work_location": "On-site",
    "company": "",
    "selective_search": "true",
    "location_radius": "5 miles (8km)"
  }
]

Set selective_search to true when title precision matters more than total volume. This setting works well for alerts like Music Teacher, Data Engineer, or Account Executive.

Leave company empty when you want a market scan. Set it when your pipeline tracks hiring activity for a named account list.

Keyword search is the right starting point for net-new monitoring. Run it on a schedule, store the returned job URLs, then pass each job URL into Extract by URL.

Use tight filters when the downstream team receives alerts. Broad searches create more review work, especially for common titles like Engineer, Manager, or Analyst.

A daily Past 24 hours search works well for alert queues. A weekly search with broader filters works better for market reports.

Step 4. Use search URL when your filter state already exists in LinkedIn

The Search LinkedIn jobs by URL scraper works when a teammate sends you a filtered LinkedIn Jobs URL. It also works when your app already builds that URL.

LinkedIn Jobs home page with the Jobs tab — Open the LinkedIn Jobs tab to search by URL

Start from the search terms and filters in LinkedIn Jobs.

Typing a LinkedIn Jobs search for music teacher roles — Configure your job search filters on LinkedIn before copying the URL

Copy the final search URL and pass it into the scraper.

LinkedIn Jobs results with the search URL highlighted — Copy the LinkedIn search results URL with your filters applied

Search URL inputs are useful for reproducibility. Your team can save the exact LinkedIn filter state, rerun it later, and compare new results against old records.

This pattern also works well for QA. If a keyword input returns unexpected results, copy the equivalent LinkedIn URL and run the URL scraper against that state.

Search URLs help during handoffs between recruiters and data teams. A recruiter can send the exact search they trust, and your pipeline can process it without rebuilding filters.

Keep copied search URLs in raw storage. They give you a record of the filter state that produced each batch.

Use URL search when your QA team cares about exact filter parity with LinkedIn. Use keyword search when your application owns the filter configuration.

API response sample

Here is a trimmed response from the LinkedIn Jobs Extract by URL scraper.

[
  {
    "inputs": {
      "url": "https://www.linkedin.com/jobs/view/4387478109"
    },
    "scrape_status": "success",
    "url": "https://www.linkedin.com/jobs/view/4387478109?_l=en",
    "job_posting_id": "4387478109",
    "job_title": "Video Editor",
    "company_name": "Universal Music Group",
    "company_id": "3007",
    "job_location": "Philadelphia, PA",
    "job_summary": "We are UMG, the Universal Music Group. We are the world's leading music company. We own and operate businesses in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. Famehouse, a division of UMG, supports direct-to-consumer commerce for labels, artists, and third-party clients. The Video Editor creates video content across e-commerce, marketing, social, and live-stream formats. The role sits within the Post Production team and reports to the Post Production Supervisor. The editor works with Creative Directors, Producers, Motion Designers, and Post Production leadership. Responsibilities include editing campaign assets, assembling rough cuts, refining pacing, and delivering final edits for multiple platforms. The role requires platform-specific knowledge, strong storytelling judgment, and the ability to work within production timelines. (truncated)"
  }
]

What data you get back, field guide

LinkedIn Jobs Scraper output schema — LinkedIn Jobs Scraper output fields grouped by category.

Each result object includes the original input, scrape status, normalized URL, job identifiers, company identifiers, location, and text content.

inputs.url is the job URL you submitted. Store this field with the output so your pipeline can trace each result back to the source request.

scrape_status tells you whether extraction worked for that input. Treat any value other than success as a failed item and send it to a retry queue.

url is the resolved LinkedIn job URL. LinkedIn can append parameters such as _l=en, so use job_posting_id for stable deduplication.

job_posting_id is the LinkedIn job ID. For https://www.linkedin.com/jobs/view/4387478109, the ID is 4387478109.

job_title is the title shown on the job post. This field is the first value most teams index for search, alerts, and matching logic.

company_name is the hiring company name on the job page. If you need company-level records, Extract LinkedIn company data returns structured company data from LinkedIn company pages.

company_id is the LinkedIn company identifier when available. Use it to join multiple jobs from the same company without relying on company name text.

job_location is the location text from the posting. Keep it as raw text in your first table, then parse it into city, region, and country.

job_summary is the full description text. Expect long values here, often 2,000 to 8,000 characters depending on the job post.

For analytics, keep both the raw job_location and parsed location fields. Raw text preserves LinkedIn's source value, while parsed fields support grouping by metro, state, country, or region.

For search, index job_title, company_name, job_location, and job_summary. Keep job_posting_id as the document ID so updates replace existing records.

For account intelligence, join jobs to companies on company_id when LinkedIn returns it. Fall back to normalized company names only when the company ID is missing.

For compliance and QA, store the full response payload. A raw payload gives your team evidence when a job disappears or LinkedIn changes visible fields.

Store scraped_at beside every result. LinkedIn job pages change, and a timestamp tells you which version your downstream users saw.

Ready to get this data? Pull structured LinkedIn job listings.

Production tips, validation, deduplication, schema, and errors

LinkedIn Jobs Scraper ID verification — How the LinkedIn Jobs Scraper deduplicates records before storage.

Validate before sending requests and after receiving results. Load raw data before applying business rules so you have replay support when your schema changes.

Validate LinkedIn job URLs before submission

Reject non-LinkedIn URLs before you spend credits. Also reject LinkedIn company pages, profile pages, feed URLs, and search URLs when using Extract by URL.

from urllib.parse import urlparse
import re

JOB_ID_PATTERN = re.compile(r"/jobs/view/(\d+)")

def extract_linkedin_job_id(url: str) -> str | None:
    parsed = urlparse(url)

    if parsed.netloc not in {"www.linkedin.com", "linkedin.com"}:
        return None

    match = JOB_ID_PATTERN.search(parsed.path)
    if not match:
        return None

    return match.group(1)


urls = [
    "https://www.linkedin.com/jobs/view/4387478109",
    "https://example.com/jobs/view/123",
    "https://www.linkedin.com/company/universal-music-group"
]

valid_inputs = []
for url in urls:
    job_id = extract_linkedin_job_id(url)
    if job_id:
        valid_inputs.append({"url": url, "expected_job_posting_id": job_id})

print(valid_inputs)

Expected output:

[
  {
    "url": "https://www.linkedin.com/jobs/view/4387478109",
    "expected_job_posting_id": "4387478109"
  }
]

The expected_job_posting_id field is optional. It gives your validation step a second check after extraction.

Compare expected_job_posting_id with the returned job_posting_id. If they differ, send the record to review before loading it into your clean table.

This check catches redirects, copied search URLs, and malformed inputs. It also catches rows where a user pasted a company URL into a job extraction queue.

Run this validation before batching inputs. There is no reason to send a URL that fails a local path check.

Normalize the host before validation if your source system stores uppercase domains. Keep the original URL separately for audits.

Deduplicate on job_posting_id

Use job_posting_id as the primary key. Job titles change, company names vary, and URLs include tracking parameters.

def dedupe_jobs(rows: list[dict]) -> list[dict]:
    seen = set()
    clean = []

    for row in rows:
        job_id = row.get("job_posting_id")
        if not job_id:
            continue

        if job_id in seen:
            continue

        seen.add(job_id)
        clean.append(row)

    return clean

For search scrapers, deduplication matters more. The same job can appear across multiple keyword, company, location, and radius combinations.

Deduplicate after every batch before writing to your clean table. Use job_posting_id as the conflict key and keep both first_seen_at and last_seen_at timestamps.

Store raw output and normalized output

Keep two tables.

Table	Purpose	Example fields
`linkedin_jobs_raw`	Preserve the API response	`input_url`, `scrape_status`, `payload_json`, `scraped_at`
`linkedin_jobs`	Query clean records	`job_posting_id`, `job_title`, `company_id`, `company_name`, `job_location`, `job_summary`

Raw storage protects you when LinkedIn changes a page field or your parser needs a fix. Clean storage gives your app stable columns.

A minimal SQL schema looks like this:

CREATE TABLE linkedin_jobs (
  job_posting_id TEXT PRIMARY KEY,
  job_title TEXT,
  company_id TEXT,
  company_name TEXT,
  job_location TEXT,
  job_summary TEXT,
  source_url TEXT,
  scrape_status TEXT,
  scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

For raw storage, use a JSON column if your database supports it. PostgreSQL teams usually store the payload in JSONB and index job_posting_id separately.

CREATE TABLE linkedin_jobs_raw (
  id BIGSERIAL PRIMARY KEY,
  input_url TEXT,
  scrape_status TEXT,
  payload_json JSONB,
  scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Keep raw records even when extraction fails. Use raw storage as your replay source when a new field becomes useful.

Retry failed inputs with a cap

Cap retries at 2 attempts per URL. Store attempt_count per input and record the final failure reason after the last attempt so your team can separate invalid URLs from temporary extraction failures.

Truncate summaries only after storage

Store the full job_summary first. Truncate later for previews, dashboards, embeddings, or search snippets. Job summaries can run several thousand characters, so use a text column designed for long values and keep everything in UTF-8.

Track scrape status separately from pipeline status

scrape_status tells you what happened during extraction. Your pipeline status tells you what happened after the API returned data. Keep those states separate because a row can have scrape_status = success and still fail your database load due to a schema error.

Field	Example values	Meaning
`scrape_status`	`success`, `failed`	Extraction result from the scraper
`load_status`	`pending`, `loaded`, `rejected`	Result of your database write
`attempt_count`	`0`, `1`, `2`	Number of retry attempts
`last_error`	`missing job_posting_id`	Last pipeline error message

Schedule refreshes by job age

Refresh recent jobs more often than older ones. A practical schedule is daily for jobs posted in the last 7 days, weekly after that, and no refresh after your retention window ends. Track first_seen_at, last_seen_at, and last_scraped_at separately since those fields answer different questions during reporting.

Normalize locations after ingestion

Store the original job_location string first, then parse it into normalized fields in a separate job. LinkedIn location strings vary by market (Philadelphia, PA, Greater London, Remote, Berlin, Germany).

Field	Example
`job_location_raw`	`Philadelphia, PA`
`city`	`Philadelphia`
`region`	`PA`
`country`	`US`
`work_location_type`	`On-site`

Keep both raw and normalized location fields. Raw values support audits, while normalized fields support grouping and joins.

Common workflows

Recruiting operations

Recruiting operations teams use this scraper to monitor competitor hiring, track role growth, and build reports for talent planning.

A common workflow starts with keyword searches for target roles. The pipeline extracts each job, deduplicates by job_posting_id, and groups records by company, location, and posting age.

For alerts, index job_title and company_name. Send a notification when a watched company posts a matching role in a target market.

Add date_posted filters when you want daily hiring alerts. A Past 24 hours search keeps the alert queue small and reduces duplicate review.

For weekly reports, group by company and normalized function. A talent team can see which competitors are hiring sales, engineering, support, or marketing roles.

Keep alert thresholds explicit. For example, notify recruiters when a competitor posts three engineering roles in the same metro within 7 days.

Sales and account intelligence

Sales teams use LinkedIn job posts as account signals. Hiring for implementation, migration, security, data, or revenue roles often maps to active budget.

A typical account workflow starts with a company list. Your system searches jobs for each company, extracts matching posts, and adds the result to the account record.

Use company_id when available. Company names change, abbreviations vary, and subsidiaries can share similar names.

Map job titles to buying signals with a controlled ruleset. For example, Salesforce Administrator, RevOps Manager, and CRM Migration Lead belong to different signal categories.

Store the matched rule beside the job record. Sales teams trust a signal more when they can see why the system flagged it.

Add the source job URL to the CRM note or account timeline. A rep can open the posting and read the signal without asking data engineering.

Data engineering

Data teams use the scraper to load job records into warehouses, search indexes, and internal datasets.

The safest pipeline writes raw results first, validates required fields, then upserts clean records. That gives you replay support when schema rules change.

Required fields usually include job_posting_id, job_title, company_name, job_location, source_url, and scrape_status. Treat missing job_posting_id as a rejection for the clean table.

Add data quality checks after every batch. At minimum, count total inputs, successful extractions, failed extractions, missing IDs, duplicate IDs, and loaded rows.

Publish those counts with the batch ID. When a downstream dashboard looks wrong, those numbers show where the loss happened.

Track cost beside batch size. ScrapeNow charges per returned row, so a 25,000-row successful extraction maps directly to 25,000 credits.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Next step

Start with Pull structured LinkedIn job listings if you already have job URLs. Submit one LinkedIn job URL and verify scrape_status, job_posting_id, company_id, and job_summary.

Use Search LinkedIn jobs by keyword when you need to find jobs first. Store the returned URLs, then extract and deduplicate records by job_posting_id.

For a clean test, run one URL through Pull structured LinkedIn job listings today. Save the JSON response, confirm the primary key, and wire the same request into your batch worker.