How to scrape LinkedIn profile data

The ScrapeNow LinkedIn profiles scraper extracts public profile data from any LinkedIn /in/ URL. One input returns the profile name, headline, location, about text, current company, work history, and recent posts as structured JSON.

Use the Get LinkedIn profile data scraper when you already have profile URLs. Feed it clean URLs, store the response, and join the fields to your CRM, ATS, or data warehouse.

Starting input	Use this scraper
`https://www.linkedin.com/in/snoopdogg/`	LinkedIn Profiles Extract by URL
`https://www.linkedin.com/company/scrapenow/`	LinkedIn Companies Extract by URL
`https://www.linkedin.com/jobs/view/...`	LinkedIn Jobs Extract by URL
LinkedIn post URL	LinkedIn Posts Extract by URL
Keyword or job title	Search scraper first, then profile scraper

How to use this scraper

The LinkedIn Profiles Scraper job pipeline, from input to stored output.

This scraper takes one input field.

Field	Required	Format	Example
`url`	Yes	Must start with `https://www.linkedin.com`	`https://www.linkedin.com/in/snoopdogg/`

The input is a public LinkedIn profile URL. Remove tracking parameters like ?trk=public_profile before submitting a batch. For a 10,000 URL run, even a 3% duplicate rate creates 300 wasted credits.

Search the person on LinkedIn to open their profile

Step 1. Open LinkedIn

Open linkedin.com.

Type the person's name or keyword in the LinkedIn search bar to find profiles

Step 2. Search for the profile

Use the LinkedIn search bar and type the keyword, full name, or user name.

For example, type Snoop Dogg, open the profile result, and copy the URL from the address bar. The URL must start with https://www.linkedin.com.

The path should point to a person profile (/in/snoopdogg/). Company pages, job pages, and post URLs need different scrapers. A valid profile URL follows this shape:

https://www.linkedin.com/in/{profile-handle}/

Strip tracking parameters like ?trk=public_profile before creating the API job.

Step 3. Put the URL into the API input

In the code below, the scraper slug is the LinkedIn Profiles Extract by URL scraper.

The input payload contains one profile URL:

[
  {
    "url": "https://www.linkedin.com/in/snoopdogg/"
  }
]

For batch jobs, add more objects to the same array. Keep each object in the same shape, with one url field per profile.

A batch with three profiles looks like this:

[
  {
    "url": "https://www.linkedin.com/in/snoopdogg/"
  },
  {
    "url": "https://www.linkedin.com/in/example-one/"
  },
  {
    "url": "https://www.linkedin.com/in/example-two/"
  }
]

Keep one profile URL per object. Use a fixed input shape for every batch.

Step 4. Run the API job

Use this Python script.

"""
Configuration:
    - Set SCRAPER_SLUG to the scraper you want to run.
    - Set SCRAPER_INPUTS to the list of input dicts matching that scraper's schema.
    - Set API_KEY to your scraper API key.
"""

import sys
import time
import json
import requests
import os

API_KEY = "YOUR_API_KEY"

SCRAPER_SLUG = "linkedin-profiles-extract-by-url"

SCRAPER_INPUTS = [
    {
        "url": "https://www.linkedin.com/in/snoopdogg/"
    }
]



BASE_URL = "https://api.scrapenow.io/api/v1/scraping"
TIMEOUT_SECONDS = 3600
POLL_INTERVAL = 5
SPINNER = "|/-\\"


def build_headers(api_key: str, content_type: str | None = None) -> dict:
    headers = {"Authorization": f"Bearer {api_key}"}
    if content_type:
        headers["Content-Type"] = content_type
    return headers


def trigger_scrape(slug: str, inputs: list[dict]) -> str:
    url = f"{BASE_URL}/scrape?scraper={slug}"
    response = requests.post(
        url,
        headers=build_headers(API_KEY, "application/json"),
        json={"inputs": inputs},
    )
    response.raise_for_status()
    return response.json()["data"]["job_id"]


def poll_until_done(job_id: str) -> str:
    start = time.time()
    i = 0
    while True:
        elapsed = time.time() - start
        if elapsed > TIMEOUT_SECONDS:
            print(f"\nTimeout after {TIMEOUT_SECONDS}s")
            sys.exit(1)
        response = requests.get(
            f"{BASE_URL}/jobs/{job_id}",
            headers=build_headers(API_KEY),
        )
        response.raise_for_status()
        data = response.json()
        status = data["data"]["status"]
        mins, secs = divmod(int(elapsed), 60)
        sys.stdout.write(
            f"\r[{SPINNER[i % 4]}] Waiting... {status} ({mins}m {secs:02d}s)  "
        )
        sys.stdout.flush()
        if status in ("completed", "failed"):
            print()
            return status
        time.sleep(POLL_INTERVAL)
        i += 1


def fetch_results(job_id: str) -> dict:
    response = requests.get(
        f"{BASE_URL}/jobs/{job_id}/results?format=json",
        headers=build_headers(API_KEY),
    )
    response.raise_for_status()
    return response.json()


def save_results(data: dict, slug: str) -> str:
    os.makedirs("output", exist_ok=True)
    filename = os.path.join("output", f"{slug}.json")
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    return filename


def main() -> None:
    print(f"Triggering scraper: {SCRAPER_SLUG}")
    job_id = trigger_scrape(SCRAPER_SLUG, SCRAPER_INPUTS)
    print(f"Job started: {job_id}")
    final_status = poll_until_done(job_id)
    if final_status != "completed":
        print(f"Job failed with status: {final_status}")
        sys.exit(1)
    print("Fetching results...")
    results = fetch_results(job_id)
    output_file = save_results(results, SCRAPER_SLUG)
    print(f"Results saved to: {output_file}")


if __name__ == "__main__":
    main()

The script starts a scraping job with POST /scrape?scraper=linkedin-profiles-extract-by-url, polls every 5 seconds, times out after 3600 seconds, and downloads the result as JSON.

For production, move API_KEY into an environment variable:

export SCRAPENOW_API_KEY="YOUR_API_KEY"
python linkedin_profiles.py

Step 5. Read the JSON response

A successful run returns one record per input URL.

[
  {
    "inputs": {
      "url": "https://www.linkedin.com/in/snoopdogg/"
    },
    "scrape_status": "success",
    "id": "snoopdogg",
    "name": "Snoop Dogg",
    "city": "Los Angeles Metropolitan Area",
    "country_code": "US",
    "position": "Coach Snoop, Team USA I CEO - Death Row Records | Founder - Death Row Pictures",
    "about": "An entertainment industry mogul, Snoop Dogg has reigned for nearly three decades as a globally recognized performer, producer, and entrepreneur. Snoop Dogg is an American rapper, singer, songwriter, actor, record producer, DJ, media personality, businessman, and icon. His work spans music, Web 3.0, technology, entertainment, lifestyle, global consumer brands, food and beverage, and cannabis. Brand partnerships and endorsements: Snoop Dogg's brand is widely recognized, and companies work with him to promote products and services. Music and entertainment collaborations: Snoop Dogg works with brands on content, sponsored events, and marketing campaigns. Cannabis and related industries: Snoop Dogg has launched several cannabis businesses and has direct market experience. Social media marketing and influencer partnerships: Snoop Dogg has a large social following, and brands partner with him to reach that audience.",
    "posts": [
      {
        "title": "Building A Personal Brand",
        "attribution": "I’ll be honest I didn’t even know what LinkedIn was a year ago, and now being on the platform I realize this is the…",
        "link": "https://www.linkedin.com/pulse/building-personal-brand-snoop-dogg",
        "created_at": "2023-08-08T00:00:00.000Z",
        "interaction": "12,94 - 1,347 Comments",
        "id": "7094669212786311168"
      },
      {
        "title": "Life Lessons",
        "attribution": "One of my goals here on LinkedIn is to give back to the LinkedIn community by sharing some of the useful lessons I…",
        "link": "https://www.linkedin.com/pulse/life-lessons-snoop-dogg",
        "created_at": "2023-08-07T00:00:00.000Z",
        "interaction": "7,11 - 864 Comments",
        "id": "7094346011845808128"
      }
    ],
    "current_company": {
      "name": "United States Olympic & Paralympic Committee",
      "company_id": "united-states-olympic-and-paralympic-committee",
      "title": "Coach Snoop, Team USA",
      "location": "Greater Milan Metropolitan Area"
    },
    "experience": [
      {
        "title": "Coach Snoop, Team USA",
        "location": "Greater Milan Metropolitan Area",
        "description_html": null,
        "start_date": "Dec"
      }
    ]
  }
]

Public profiles vary across accounts, regions, and visibility settings. Treat nullable fields as expected data, not parser failures.

What data you get back

LinkedIn Profiles Scraper output schema — LinkedIn Profiles Scraper output fields grouped by category.

The profile scraper returns a flat profile object with nested arrays for posts and experience.

Use scrape_status as the first filter. Store successful rows, route failed rows into a retry table, and keep the original inputs.url for traceability.

Field	Type	Use it for
`inputs.url`	string	Original URL submitted to the scraper
`scrape_status`	string	Job-level result for that input
`id`	string	LinkedIn public profile handle
`name`	string	Full profile name
`city`	string	Public location text
`country_code`	string	2-letter country code when available
`position`	string	Current headline
`about`	string	Profile summary text
`posts`	array	Recent LinkedIn articles or posts found on the profile
`current_company`	object	Current employer name, ID, title, and location
`experience`	array	Work history entries

Ready to get this data? Get LinkedIn profile data.

Use the id field as your primary person key for deduplication. The public handle stays stable even when URLs include different tracking parameters or trailing slashes.

The current_company.company_id field pairs well with the Extract LinkedIn company data scraper when you need company-level fields after extracting a person profile.

Keep the full response in storage even if your application uses only a few fields today. Raw JSON saves a backfill job when your team adds post analysis, company matching, or experience filters later.

Field mapping for common workflows

Recruiting teams usually care about name, position, city, current_company, and experience. These fields support candidate matching, hiring manager enrichment, and account research.

Sales teams usually care about name, position, current_company.company_id, and recent posts. Those fields help connect a person record to an account record.

Data teams should store every field and add typed columns only for fields they query. This keeps ingestion simple and leaves room for new downstream models.

Workflow	Fields to index
Recruiting enrichment	`id`, `name`, `position`, `city`, `experience`
CRM enrichment	`id`, `name`, `current_company.company_id`, `position`
Account research	`current_company.company_id`, `posts`, `position`
Data warehouse joins	`id`, `inputs.url`, `scrape_status`, `country_code`

Do not build business logic around display text alone. Use profile IDs and company IDs wherever the response provides them.

Production tips

A production pipeline follows this sequence: normalize URLs, remove duplicates, submit batches, poll jobs, store raw results, flatten stable fields, retry failed rows.

Validate LinkedIn profile URLs before sending jobs

Invalid inputs waste credits. Use a strict URL validator before calling the API:

from urllib.parse import urlparse

def is_valid_linkedin_profile_url(url: str) -> bool:
    parsed = urlparse(url)

    if parsed.scheme != "https":
        return False

    if parsed.netloc != "www.linkedin.com":
        return False

    if not parsed.path.startswith("/in/"):
        return False

    profile_id = parsed.path.strip("/").split("/")[-1]
    return len(profile_id) > 0


urls = [
    "https://www.linkedin.com/in/snoopdogg/",
    "http://www.linkedin.com/in/example/",
    "https://linkedin.com/in/example/",
    "https://www.linkedin.com/company/scrapenow/"
]

valid_urls = [url for url in urls if is_valid_linkedin_profile_url(url)]
print(valid_urls)

Expected output:

[
  "https://www.linkedin.com/in/snoopdogg/"
]

If your input list contains company pages, route those to the Extract LinkedIn company data scraper instead.

Normalize URLs before submission

Normalize URLs to remove query strings and trailing slash differences before batching:

from urllib.parse import urlparse, urlunparse

def normalize_linkedin_url(url: str) -> str:
    parsed = urlparse(url)
    path = parsed.path.rstrip("/") + "/"

    return urlunparse((
        "https",
        "www.linkedin.com",
        path,
        "",
        "",
        ""
    ))


raw_urls = [
    "https://www.linkedin.com/in/snoopdogg/",
    "https://www.linkedin.com/in/snoopdogg/?trk=public_profile",
    "https://www.linkedin.com/in/snoopdogg"
]

deduped = sorted({normalize_linkedin_url(url) for url in raw_urls})
print(deduped)

Expected output:

[
  "https://www.linkedin.com/in/snoopdogg/"
]

Run normalization before splitting work into batches. If you dedupe inside each batch only, the same profile still lands in multiple jobs.

Dedupe by profile handle after extraction

The response id field catches profile-level duplicates that URL normalization misses. Use the profile handle as your person key:

def profile_key(record: dict) -> str:
    profile_id = record.get("id")
    source_url = record.get("inputs", {}).get("url")

    if profile_id:
        return f"linkedin_id:{profile_id}"

    return f"source_url:{source_url}"

Apply this key before inserting into your person table. Do not use name as a unique key since names change and duplicates are common.

Store results with a stable schema

Store stable fields in typed columns and keep the full JSON payload for backfills:

def flatten_profile(record: dict) -> dict:
    current_company = record.get("current_company") or {}

    return {
        "profile_id": record.get("id"),
        "source_url": record.get("inputs", {}).get("url"),
        "name": record.get("name"),
        "city": record.get("city"),
        "country_code": record.get("country_code"),
        "position": record.get("position"),
        "about": record.get("about"),
        "current_company_id": current_company.get("company_id"),
        "current_company_name": current_company.get("name"),
        "scrape_status": record.get("scrape_status"),
        "raw_json": record
    }

Keep posts and experience in separate child tables if you query them often. For PostgreSQL, store raw_json as jsonb. For BigQuery, use a JSON column or keep it in object storage with a row pointer.

Handle partial data

Public LinkedIn profiles expose different fields across accounts. Some profiles have no about, no visible posts, or sparse experience data. Write nullable parsing from day one:

def extract_current_company_id(record: dict) -> str | None:
    company = record.get("current_company") or {}
    return company.get("company_id")

Treat missing arrays as empty lists and missing objects as empty dicts. One sparse profile should never stop a 5,000 URL batch.

Track and retry failed inputs

Store failed records with the original URL, job ID, and timestamp. Cap retries at 3 attempts. Separate scraper failures from import failures since retrying the scraper will not fix a database constraint error.

Combine profile data with other LinkedIn scrapers

The LinkedIn Profiles Scraper multi scraper architecture.

Profile URLs are one part of a LinkedIn data pipeline. Most production setups combine people, companies, jobs, and posts.

Starting point	Scraper to use	Output
Person profile URL	Get LinkedIn profile data	Name, headline, company, experience, posts
Company URL	Extract LinkedIn company data	Company name, industry, size, location
Job URL	Pull structured LinkedIn job listings	Job title, company, location, description
Keyword search	Search LinkedIn jobs by keyword	Job listings from search terms
Post URL	Extract LinkedIn post data	Post text, author, engagement fields

A recruiting pipeline typically starts with job URLs, extracts company data, then adds public people profiles for founders and hiring managers. A sales pipeline often starts with company pages, then enriches decision makers by profile URL. The LinkedIn Jobs scraper guide and LinkedIn Posts scraper guide cover those steps.

Join current_company.company_id from a profile record to the company scraper output for company size, industry, and location fields. Keep each scraper output in its own table and use separate refresh schedules since company data changes slower than profile headlines.

Browse all 86+ scrapers across 14 platforms.

Pricing

ScrapeNow charges per returned row. One row costs one credit, starting at $0.04 per credit for small runs and dropping with volume. No monthly contracts, no proxy fees, no charges for failed rows. See the pricing page for current rates.

Open the Get LinkedIn profile data scraper, paste one known public /in/ URL, and confirm the JSON fields match your table schema. Then submit your normalized batch and route company, job, or post URLs through the matching scraper from the Browse all 86+ scrapers.