Raw Job Records
This is the core of our data. Generally, models are built off of this data, or aggregates of the data. In general, when using any other file, do a left join with this file - dropping any reference information that does not have a job record (ie companies with no job records).
Daily File | CSV | Parquet |
---|---|---|
Location | Standard/Feeds/Raw Daily Job Records/ | Standard/Feeds/Parquet/Raw Daily Job Records/ |
File Name | raw_daily_records_YYYY-MM-DD.csv.gz | raw_daily_records_YYYY-MM-DD-part-###.parquet |
Monthly File | CSV | Parquet |
Location | Standard/Feeds/Raw Full Job Records v2/ | Standard/Feeds/Parquet/Raw Full Job Records v2/ |
File Name | raw_job_archive_v2_full_YYYY-MM-DD.tar.gz | raw_job_archive_v2_YYYY-MM-DD-part-####.parquet |
Job Records Table
Field Name | Data Type | Description |
hash | string/varchar | The unique identifier for job records. This is used to join job descriptions to job records. |
title | string/varchar | Job title scraped from the unique url for the job post |
company_id | string/varchar | Unique identifier for a company-scrape. This is is used to join to reference files with job records. |
company_name | string/varchar | This is the name of the company pertaining to the company_ID. |
city | string/varchar | City location for the job posting. |
state | string/varchar | State or region for the job posting. |
zip | string/varchar | Postal code for the job posting. |
country | string/varchar | |
created | timestamp | The first time this job was observed and scraped. |
last_checked | timestamp | The most recent time this site was scraped and this job posting was observed. |
last_updated | timestamp | The most recent time that this site was scraped and observed a change in the job posting. |
delete_date | timestamp | The most recent time this site was scraped and this job posting was not found. |
unmapped_location | boolean | If True, this indicates that the job posting location was not able to be accurately identified. |
url | string | The unique URL for the job posting. |
base_hash | string/varchar | The unique identifier for job records on employer websites. This differs from hash as one base_hash might have multiple “hashes” as our dataset splits out jobs for each location listed on a job posting. |
Raw Job Record Structured Fields
This table provides several additional fields typically found on a job portal or job record for any one job listing. Join this data to the job records table by hash.
What is a structured field?
A structured field is a data point found on a job portal or job record and is defined by the employer consistently across their job listings. The consistency of structured fields allows our web scrapes to capture additional data points not found in the job title or description.
Will every employer have all structured fields?
The structured fields on any job listing are provided by the employer at their discretion. It is not expected that all employers will provide all structured field data.
Daily File | CSV | Parquet |
---|---|---|
Location | Standard/Feeds/Raw Daily Structured Fields/ | Standard/Feeds/Parquet/Raw Daily Structured Fields/ |
File Name | raw_daily_structured_fields_YYYY-MM-DD.csv.gz | raw_daily_structured_fields_YYYY-MM-DD-part-###.parquet |
Monthly File | CSV | Parquet |
Location | Standard/Feeds/Raw Full Structured Fields/ | Standard/Feeds/Parquet/Raw Full Structured Fields/ |
File Name | raw_full_structured_fields_YYYY-MM-DD.tar.gz | raw_full_extended_fields_YYYY-MM-DD-part-####.parquet |
Structured Fields Table
Field Name | Data Type | Description |
hash | string/varchar | The unique identifier for job records. This is used to join job descriptions to job records. |
reqid | string/varchar | The job ID or requisition ID for the job opening defined by the employer. This field can be used to track a job opening across multiple employer-owned career portals. |
category | string/varchar |
A classification or group which the job is categorized into by the employer on the career portal. Category could be a team, department, or brand defined by the employer on the job listing or career portal. |
subcategory | string/varchar |
Sub-category is a secondary or additional category to the primary category. The sub-category field is defined by the employer as a secondary classification for their open roles. An example of sub-category would further define the role under a category. Where a category may be Retail the sub-category could be Shift-Supervisor. |
address | string/varchar | The address for the role provided on the job listing |
certifications | string/varchar | Any required certifications or licenses for the role provided on the job listing. |
posted_date | timestamp | The date shown on the job listing as the posted date or the date the job became available on the career page in YYYY-MM-DD format. |
close_date | timestamp | The date shown on the job listing as the close date or the date the employer stops accepting applications, in YYYY-MM-DD format. |
commission_eligible | string/varchar | Information provided on the job listing about commission eligibility. |
compensation | string/varchar | The compensation amount or range provided on the job listing. |
contract_length | string/varchar | Information detailing the length of a contracted position. |
division | string/varchar | The brand or division of a company the job listing falls under. |
education_requirements | string/varchar | Educational requirements provided on the job listing. |
employment_type | string/varchar | Describes the role as permanent, temporary, Internship, Contract, Seasonal or Project based work. |
experience_required | string/varchar | Minimum experience required for the role. |
signing_bonus | string/varchar | Provides any signing bonus offered by the employer on the job listing. |
shift | string/varchar | What shift or shifts the job listing is hiring for. |
site_id | string/varchar | The store number or facility associated with the job listing. |
time_type | string/varchar | Describes the work schedule. Typically full-time and/or part-time. |
travel_requirements | string/varchar | Any travel requirements for the role provided on the job listing. |
vacancy_count | string/varchar | The count of hires being made for the job listing. |
work_location | string/varchar | Describes where the work for this role takes place, such as on-site, hybrid, or work from home. |
Raw Job Descriptions
This contains only the job descriptions. To use job descriptions, you will join with Job records on hash. For NLP applications you would typically join this with job descriptions and use a combination of job title from job records file and job description.
Daily File | XML | Parquet |
---|---|---|
Location | /Feeds/Raw Daily Job Descriptions/ | /Feeds/Parquet/Raw Daily Job Descriptions/ |
File Name | raw_daily_descriptions_YYYY-MM-DD.xml | raw_daily_descriptions_YYYY-MM-DD.parquet |
Monthly File | XML | Parquet |
Location | /Feeds/Raw Full Job Descriptions/ | |
File Name | linkup_job_descriptions_YYYY-M-DD.tar.gz |
Raw Job Descriptions Table
Field Name | Data Type | Description |
hash | string/varchar | The unique identifier for job records. This is used to join job descriptions to job records. |
description | string/varchar | The full job description scraped from the website for the job posting. |
Full-time/Part-time
The fulltime_parttime table is a point-in-time table used in the LinkUp dataset to identify if a job has language in its posting that indicates if the job is full-time, part-time or both.
The full-time/part-time logic uses a robust keyword analysis which is applied to a job’s title, description and structured fields for jobs written in English to determine how to categorize the job.
Daily File | CSV | Parquet |
---|---|---|
Location | Standard/Feeds/Raw Daily Fulltime Parttime/ | Standard/Feeds/Parquet/Raw Daily Fulltime Parttime/ |
File Name | raw_daily_fulltime_parttime_YYYY-MM-DD.csv.gz | raw_daily_records_YYYY-MM-DD-part-###.parquet |
Monthly File | CSV | Parquet |
Location | Standard/Feeds/Raw Full Fulltime Parttime/ | Standard/Feeds/Parquet/Raw Full Fulltime Parttime/ |
File Name | raw_full_fulltime_parttime_YYYY-MM-DD-part-###.csv.gz | raw_full_fulltime_parttime_YYYY-MM-DD-part-###.parquet |
Full-time/Part-time Table
Field Name | Data Type | Description |
job_hash | string/varchar | The unique identifier for job records. This is used to join job descriptions to job records. |
fulltime_parttime | string |
fulltime = The job title, description, or time_type structured field has language that the position is a full-time position and/or indicates at least 40 hours of work a week.
|
start_date | date | The start date for this row. This is used for joining point-in-time information. |
end_date | date | The end date for this row. A NULL end date indicates that the row is the current or most recent full-time/part-time status. |
Company Scrape Log
This is a record of when scrapes run and changes to scrapes. The primary use is identifying outliers. This can be used to classify outliers as legitimate or due to a scrape break. This can eliminate noise and false signals in job data.
Scrape changes can provide meaningful information as well as just reducing noise. For example after an influx of financing, a company may change Applicant Tracking Systems (e.g., Charming Charlie coming out of bankruptcy). A change was recognized because a change was made to the scrape to fix it and the documentation of when that change occurred is in the scrape log.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Raw Daily Company Scrape Log/ | /Feeds/Parquet/Raw Daily Company Scrape Log/ |
File Name | raw_company_scrape_log_daily_YYYY-MM-DD.csv | raw_company_scrape_log_daily_YYYY-MM-DD.parquet |
Monthly File | CSV | Parquet |
Location | /Feeds/Raw Full Company Scrape Log/ | /Feeds/Parquet/Raw Full Company Scrape Log/ |
File Name | raw_company_scrape_log_full_YYYY-MM-DD.csv | raw_company_scrape_log_daily_YYYY-MM-DD.parquet |
Company Scrape Log Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
date | date | The date the change occurred for the scrape. |
scrape_run_complete | boolean | If True, this indicates the company-id was scraped on this date |
scrape_changed | boolean | If True, this indicates that the code was modified for the scrape on this date |
Company Ticker Reference
This file shows point-in-time ticker information for company ids. This can be joined to job records to understand mergers and acquisitions.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Company Ticker Reference/ | /Feeds/Parquet/Company Ticker Reference/ |
File Name | company_ticker_YYYY-MM-DD.csv | company_ticker_YYYY-MM-DD.parquet |
Company Ticker Reference Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
start_date | date | The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join. |
end_date | date | The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join. |
ticker_symbol | string/varchar | The ticker symbol |
stock_exchange_country | string/varchar | The country of the stock exchange that the ticker symbol is traded on. |
stock_exchange_name | string/varchar | The stock exchange symbol that the ticker is traded on. |
primary_flag | boolean | If True, this is the primary ticker for the company_id. |
Employer Type Reference
Employer Type can be used to filter the RAW dataset down to public, private, or multiple levels of government employers and jobs such as post-secondary or K-12 education and federal or local government employers.
The Employer Type table is point-in-time, starting in 2024 when we reviewed each active company_id and assigned an employer type. If a change is detected, a new row for the company_id will be created and denoted with a new start_date. An end_date is applied to the previous row to delineate that the row is not the most current.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Raw Daily Employer Type Reference/ | /Feeds/Parquet/Raw Daily Employer Type Reference/ |
File Name | raw_daily_employer_type_YYYY-MM-DD.csv | raw_daily_employer_type_YYYY-MM-DD.parquet |
Monthly File | CSV | Parquet |
Location | /Feeds/Raw Full Employer Type Reference/ | /Feeds/Parquet/Raw Full Employer Type Reference/ |
File name | raw_full_employer_type_YYYY-MM-DD.csv | raw_full_employer_type_YYYY-MM-DD.parquet |
Employer Type Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
employer_type_id | integer | The numerical ID for employer type. |
employer_type | string/varchar | The description of the employer type id |
start_date | date | The start date for this row. This is used for joining point-in-time information. |
end_date | date | The end date for this row. This is used for joining point-in-time information. |
Employer Types
Value | Employer_type_id | Description |
Public Company | 1 | For-profit, publicly traded corporations. |
Private Company | 2 | For-profit, non-public corporations. |
Post-secondary Education | 3 | Non-profit or governmental post-secondary educational institutions. |
K-12 Education | 4 | Non-profit or governmental K-12 educational institutions. |
Non-Profit | 5 | Non-profit corporations. |
Federal Government | 6 | Federal government, military, and non-U.S. equivalent entities. |
Local Government | 7 | Non federal government employers at the state, county, territorial, city, township, parish, etc. |
Remote Tag
The remote tag is a point-in-time table used to find all remote and non-remote work in the LinkUp dataset. Hybrid roles are considered remote work.
Remote tag is determined using a robust keyword analysis which is applied to job records written in English.
LinkUp is working on providing an additional data element to distinguish between remote and hybrid positions.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Raw Daily Remote Tag/ | /Feeds/Parquet/Raw Daily Remote Tag/ |
File Name | raw_daily_remote_tag_YYYY-MM-DD.csv | raw_daily_remote_tag_YYYY-MM-DD.parquet |
Monthly File | CSV | Parquet |
Location | /Feeds/Raw Full Remote Tag/ | /Feeds/Parquet/Raw Full Remote Tag/ |
File name | raw_full_remote_tag_YYYY-MM-DD.csv | raw_full_remote_tag_YYYY-MM-DD.parquet |
Remote Tag Table
Field Name | Data Type | Description |
hash | string/varchar | The unique identifier for job records. This is used to join to job record data. |
remote_status | boolean | TRUE = The job title, description, or a structured field on the job listing contains keywords or phrases that indicate the role, in some way, is a remote position. This includes hybrid, telecommute, or work from home roles. FALSE = The job title, description, or a structured field on the job listing either contains keywords that identify the job as a non-remote position, or contains no keywords/language in the description that indicates the role is remote, work from home, or hybrid capable. |
remote_detail | string | The classification of Hybrid or Remote, where applicable. |
start_date | date | The start date for this row. This is used for joining point-in-time information. |
end_date | date | The end date for this row. A NULL end date indicates that the row is the current or most recent remote_status. |
Company ISIN* Reference
This file shows point-in-time ISIN (International Securities Identification Number), mapped via FactSet concordance. Please note, a license is required to receive this file.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Company ISIN Reference/ | /Feeds/Parquet/Company ISIN Reference/ |
File Name | company_isin_YYYY-MM-DD.csv | company_isin_YYYY-MM-DD.parquet |
Company ISIN Reference Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
start_date | date | The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join. |
end_date | date | The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join. |
isin | string/varchar | International Securities Identification Number, mapped using FactSet |
primary_flag | boolean | If True, this is the primary ISIN for the company_id. |
Company CUSIP* Reference
This file shows point-in-time CUSIP (Committee on Uniform Securities Identification Procedures), mapped via FactSet concordance. This identifier is primarily used for publicly traded organizations in the United States. Please note, a license is required to receive this file.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Company CUSIP Reference/ | /Feeds/Parquet/Company CUSIP Reference/ |
File Name | company_cusip_YYYY-MM-DD.csv | company_cusip_YYYY-MM-DD.parquet |
Company CUSIP Reference Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
start_date | date | The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join. |
end_date | date | The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join. |
cusip | string/varchar | Committee on Uniform Securities Identification procedures number, mapped via FactSet. |
primary_flag | boolean | If True, this is the primary CUSIP for the company_id. |
Company Sedol Reference
This file shows point-in-time SEDOL (Stock Exchange Daily Official List), mapped via FactSet concordance. This identifier is managed by the London Stock Exchange. Please note a license is required to receive this file.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Company Sedol Reference/ | /Feeds/Parquet/Company Sedol Reference/ |
File Name | company_sedol_YYYY-MM-DD.csv | company_sedol_YYYY-MM-DD.parquet |
Company SEDOL Reference Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
start_date | date | The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join. |
end_date | date | The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join. |
sedol | string/varchar | Stock exchange daily official list number, mapped via FactSet. |
primary_flag | boolean | If True, this is the primary SEDOL for the company_id. |
PIT Company Reference
This file shows point-in-time company information for a company_id. This can be joined to job records to understand some corporate change (e.g, corporate name change, url changes, etc.). This would be joined to job records or aggregates to be used, dropping any company ids with no job records.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/Raw Daily PIT Company Reference/ | /Feeds/Parquet/Raw Daily PIT Company Reference/ |
File Name | raw_pit_company_reference_daily_YYYY-MM-DD.csv | raw_pit_company_reference_daily_YYYY-MM-DD.parquet |
Monthly File | CSV | Parquet |
Location | /Feeds/Raw Full PIT Company Reference/ | /Feeds/Parquet/Raw Full PIT Company Reference/ |
File Name | raw_pit_company_reference_full_YYYY-MM-DD.csv | raw_pit_company_reference_full_YYYY-MM-DD.parquet |
PIT Company Reference Table
Field Name | Data Type | Description |
company_id | string/varchar | The unique identifier for job records. This is used to join to reference files or aggregates. |
start_date | date | The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join. |
end_date | date | The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join. |
company_name | string/varchar | The name for the company_id. |
company_url | string/varchar | The URL for the company_id. |
lei | string/varchar | Legal Entity Identifier |
open_perm_id | string/varchar | Open source company identifier used for joining to other data. |
naics_code | string/varchar | Industry classification. This can be used to join to the Bureau of Labor Statistics salary data. |
O*Net-SOC Taxonomy 2019 Reference
This file provides ONET-SOC code by job_hash, currently available in 2010 and 2019 taxonomy which supports databases 25.1 through the latest release, 26.1. ONET is the primary source for standardized occupation information in the US for over 1,000 occupations covering the entire US Economy.
NOTE: 2010 ONet data is available up to October 1st, 2022.
Daily File | CSV | Parquet |
---|---|---|
Location | /Feeds/ONet Taxonomy 2019 Daily v2/ | /Feeds/Parquet/ONet Taxonomy 2019 Daily v2/ |
File Name | onet_taxonomy_2019_daily_v2_YYYY-MM-DD.csv | onet_taxonomy_2019_daily_YYYY-MM-DD.csv |
Monthly File | CSV | Parquet |
Location | /Feeds/ONet Taxonomy 2019 Full v2/ | /Feeds/Parquet/ONet Taxonomy 2019 Full v2/ |
File Name | onet_taxonomy_2019_full_v2_YYYY-MM-DD.csv | onet_taxonomy_2019_full_v2_YYYY-MM-DD-part-###.parquet |
ONET Taxonomy Table
Field Name | Data Type | Description |
hash | string/varchar | The unique identifier for job records. This is used to join job descriptions to job records. |
onet_occupation_code | string | The ONET classification of the job. It is one way of normalizing job titles. |
*Copyright © 2020, American Bankers Association CUSIP Database provided by S&P Global Market Intelligence LLC. All rights reserved.