Data Dictionary - RAW Data Package Feeds

Connor Schlehuber

Apr 08, 2025

12321

Raw Job Records

This is the core of our data. Generally, models are built off of this data, or aggregates of the data. In general, when using any other file, do a left join with this file - dropping any reference information that does not have a job record (ie companies with no job records).

Daily File	CSV	Parquet
Location	Standard/Feeds/Raw Daily Job Records/	Standard/Feeds/Parquet/Raw Daily Job Records/
File Name	raw_daily_records_YYYY-MM-DD.csv.gz	raw_daily_records_YYYY-MM-DD-part-###.parquet
Monthly File	CSV	Parquet
Location	Standard/Feeds/Raw Full Job Records v2/	Standard/Feeds/Parquet/Raw Full Job Records v2/
File Name	raw_job_archive_v2_full_YYYY-MM-DD.tar.gz	raw_job_archive_v2_YYYY-MM-DD-part-####.parquet

Job Records Table

Field Name	Data Type	Description
hash	string/varchar	The unique identifier for job records. This is used to join job descriptions to job records.
title	string/varchar	Job title scraped from the unique url for the job post
company_id	string/varchar	Unique identifier for a company-scrape. This is is used to join to reference files with job records.
company_name	string/varchar	This is the name of the company pertaining to the company_ID.
city	string/varchar	City location for the job posting.
state	string/varchar	State or region for the job posting.
zip	string/varchar	Postal code for the job posting.
country	string/varchar
created	timestamp	The first time this job was observed and scraped.
last_checked	timestamp	The most recent time this site was scraped and this job posting was observed.
last_updated	timestamp	The most recent time that this site was scraped and observed a change in the job posting.
delete_date	timestamp	The most recent time this site was scraped and this job posting was not found.
unmapped_location	boolean	If True, this indicates that the job posting location was not able to be accurately identified.
url	string	The unique URL for the job posting.
base_hash	string/varchar	The unique identifier for job records on employer websites. This differs from hash as one base_hash might have multiple “hashes” as our dataset splits out jobs for each location listed on a job posting.

Raw Job Record Structured Fields

This table provides several additional fields typically found on a job portal or job record for any one job listing. Join this data to the job records table by hash.

What is a structured field?

A structured field is a data point found on a job portal or job record and is defined by the employer consistently across their job listings. The consistency of structured fields allows our web scrapes to capture additional data points not found in the job title or description.

Will every employer have all structured fields?

The structured fields on any job listing are provided by the employer at their discretion. It is not expected that all employers will provide all structured field data.

Daily File	CSV	Parquet
Location	Standard/Feeds/Raw Daily Structured Fields/	Standard/Feeds/Parquet/Raw Daily Structured Fields/
File Name	raw_daily_structured_fields_YYYY-MM-DD.csv.gz	raw_daily_structured_fields_YYYY-MM-DD-part-###.parquet
Monthly File	CSV	Parquet
Location	Standard/Feeds/Raw Full Structured Fields/	Standard/Feeds/Parquet/Raw Full Structured Fields/
File Name	raw_full_structured_fields_YYYY-MM-DD.tar.gz	raw_full_extended_fields_YYYY-MM-DD-part-####.parquet

Structured Fields Table

Field Name	Data Type	Description
hash	string/varchar	The unique identifier for job records. This is used to join job descriptions to job records.
reqid	string/varchar	The job ID or requisition ID for the job opening defined by the employer. This field can be used to track a job opening across multiple employer-owned career portals.
category	string/varchar	A classification or group which the job is categorized into by the employer on the career portal. Category could be a team, department, or brand defined by the employer on the job listing or career portal.
subcategory	string/varchar	Sub-category is a secondary or additional category to the primary category. The sub-category field is defined by the employer as a secondary classification for their open roles. An example of sub-category would further define the role under a category. Where a category may be Retail the sub-category could be Shift-Supervisor.
address	string/varchar	The address for the role provided on the job listing
certifications	string/varchar	Any required certifications or licenses for the role provided on the job listing.
posted_date	timestamp	The date shown on the job listing as the posted date or the date the job became available on the career page in YYYY-MM-DD format.
close_date	timestamp	The date shown on the job listing as the close date or the date the employer stops accepting applications, in YYYY-MM-DD format.
commission_eligible	string/varchar	Information provided on the job listing about commission eligibility.
compensation	string/varchar	The compensation amount or range provided on the job listing.
contract_length	string/varchar	Information detailing the length of a contracted position.
division	string/varchar	The brand or division of a company the job listing falls under.
education_requirements	string/varchar	Educational requirements provided on the job listing.
employment_type	string/varchar	Describes the role as permanent, temporary, Internship, Contract, Seasonal or Project based work.
experience_required	string/varchar	Minimum experience required for the role.
signing_bonus	string/varchar	Provides any signing bonus offered by the employer on the job listing.
shift	string/varchar	What shift or shifts the job listing is hiring for.
site_id	string/varchar	The store number or facility associated with the job listing.
time_type	string/varchar	Describes the work schedule. Typically full-time and/or part-time.
travel_requirements	string/varchar	Any travel requirements for the role provided on the job listing.
vacancy_count	string/varchar	The count of hires being made for the job listing.
work_location	string/varchar	Describes where the work for this role takes place, such as on-site, hybrid, or work from home.

Raw Job Descriptions

This contains only the job descriptions. To use job descriptions, you will join with Job records on hash. For NLP applications you would typically join this with job descriptions and use a combination of job title from job records file and job description.

Daily File	XML	Parquet
Location	/Feeds/Raw Daily Job Descriptions/	/Feeds/Parquet/Raw Daily Job Descriptions/
File Name	raw_daily_descriptions_YYYY-MM-DD.xml	raw_daily_descriptions_YYYY-MM-DD.parquet
Monthly File	XML	Parquet
Location	/Feeds/Raw Full Job Descriptions/
File Name	linkup_job_descriptions_YYYY-M-DD.tar.gz

Raw Job Descriptions Table

Field Name	Data Type	Description
hash	string/varchar	The unique identifier for job records. This is used to join job descriptions to job records.
description	string/varchar	The full job description scraped from the website for the job posting.

Full-time/Part-time

The fulltime_parttime table is a point-in-time table used in the LinkUp dataset to identify if a job has language in its posting that indicates if the job is full-time, part-time or both.

The full-time/part-time logic uses a robust keyword analysis which is applied to a job’s title, description and structured fields for jobs written in English to determine how to categorize the job.

Daily File	CSV	Parquet
Location	Standard/Feeds/Raw Daily Fulltime Parttime/	Standard/Feeds/Parquet/Raw Daily Fulltime Parttime/
File Name	raw_daily_fulltime_parttime_YYYY-MM-DD.csv.gz	raw_daily_records_YYYY-MM-DD-part-###.parquet
Monthly File	CSV	Parquet
Location	Standard/Feeds/Raw Full Fulltime Parttime/	Standard/Feeds/Parquet/Raw Full Fulltime Parttime/
File Name	raw_full_fulltime_parttime_YYYY-MM-DD-part-###.csv.gz	raw_full_fulltime_parttime_YYYY-MM-DD-part-###.parquet

Full-time/Part-time Table

Field Name	Data Type	Description
job_hash	string/varchar	The unique identifier for job records. This is used to join job descriptions to job records.
fulltime_parttime	string	fulltime = The job title, description, or time_type structured field has language that the position is a full-time position and/or indicates at least 40 hours of work a week. parttime = The job title, description, or time_type structured field has language that the position is a part-time position and/or indicates less than 40 hours of work a week. fulltime_parttime = The job title, description, or time_type structured field has language that the position is either full-time or part-time.
start_date	date	The start date for this row. This is used for joining point-in-time information.
end_date	date	The end date for this row. A NULL end date indicates that the row is the current or most recent full-time/part-time status.

Company Scrape Log

This is a record of when scrapes run and changes to scrapes. The primary use is identifying outliers. This can be used to classify outliers as legitimate or due to a scrape break. This can eliminate noise and false signals in job data.

Scrape changes can provide meaningful information as well as just reducing noise. For example after an influx of financing, a company may change Applicant Tracking Systems (e.g., Charming Charlie coming out of bankruptcy). A change was recognized because a change was made to the scrape to fix it and the documentation of when that change occurred is in the scrape log.

Daily File	CSV	Parquet
Location	/Feeds/Raw Daily Company Scrape Log/	/Feeds/Parquet/Raw Daily Company Scrape Log/
File Name	raw_company_scrape_log_daily_YYYY-MM-DD.csv	raw_company_scrape_log_daily_YYYY-MM-DD.parquet
Monthly File	CSV	Parquet
Location	/Feeds/Raw Full Company Scrape Log/	/Feeds/Parquet/Raw Full Company Scrape Log/
File Name	raw_company_scrape_log_full_YYYY-MM-DD.csv	raw_company_scrape_log_daily_YYYY-MM-DD.parquet

Company Scrape Log Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
date	date	The date the change occurred for the scrape.
scrape_run_complete	boolean	If True, this indicates the company-id was scraped on this date
scrape_changed	boolean	If True, this indicates that the code was modified for the scrape on this date

Company Ticker Reference

This file shows point-in-time ticker information for company ids. This can be joined to job records to understand mergers and acquisitions.

Daily File	CSV	Parquet
Location	/Feeds/Company Ticker Reference/	/Feeds/Parquet/Company Ticker Reference/
File Name	company_ticker_YYYY-MM-DD.csv	company_ticker_YYYY-MM-DD.parquet

Company Ticker Reference Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
start_date	date	The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join.
end_date	date	The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join.
ticker_symbol	string/varchar	The ticker symbol
stock_exchange_country	string/varchar	The country of the stock exchange that the ticker symbol is traded on.
stock_exchange_name	string/varchar	The stock exchange symbol that the ticker is traded on.
primary_flag	boolean	If True, this is the primary ticker for the company_id.

Employer Type Reference

Employer Type can be used to filter the RAW dataset down to public, private, or multiple levels of government employers and jobs such as post-secondary or K-12 education and federal or local government employers.

The Employer Type table is point-in-time, starting in 2024 when we reviewed each active company_id and assigned an employer type. If a change is detected, a new row for the company_id will be created and denoted with a new start_date. An end_date is applied to the previous row to delineate that the row is not the most current.

Daily File	CSV	Parquet
Location	/Feeds/Raw Daily Employer Type Reference/	/Feeds/Parquet/Raw Daily Employer Type Reference/
File Name	raw_daily_employer_type_YYYY-MM-DD.csv	raw_daily_employer_type_YYYY-MM-DD.parquet
Monthly File	CSV	Parquet
Location	/Feeds/Raw Full Employer Type Reference/	/Feeds/Parquet/Raw Full Employer Type Reference/
File name	raw_full_employer_type_YYYY-MM-DD.csv	raw_full_employer_type_YYYY-MM-DD.parquet

Employer Type Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
employer_type_id	integer	The numerical ID for employer type.
employer_type	string/varchar	The description of the employer type id
start_date	date	The start date for this row. This is used for joining point-in-time information.
end_date	date	The end date for this row. This is used for joining point-in-time information.

Employer Types

Value	Employer_type_id	Description
Public Company	1	For-profit, publicly traded corporations.
Private Company	2	For-profit, non-public corporations.
Post-secondary Education	3	Non-profit or governmental post-secondary educational institutions.
K-12 Education	4	Non-profit or governmental K-12 educational institutions.
Non-Profit	5	Non-profit corporations.
Federal Government	6	Federal government, military, and non-U.S. equivalent entities.
Local Government	7	Non federal government employers at the state, county, territorial, city, township, parish, etc.

Remote Tag

The remote tag is a point-in-time table used to find all remote and non-remote work in the LinkUp dataset. Hybrid roles are considered remote work.

Remote tag is determined using a robust keyword analysis which is applied to job records written in English.

LinkUp is working on providing an additional data element to distinguish between remote and hybrid positions.

Daily File	CSV	Parquet
Location	/Feeds/Raw Daily Remote Tag/	/Feeds/Parquet/Raw Daily Remote Tag/
File Name	raw_daily_remote_tag_YYYY-MM-DD.csv	raw_daily_remote_tag_YYYY-MM-DD.parquet
Monthly File	CSV	Parquet
Location	/Feeds/Raw Full Remote Tag/	/Feeds/Parquet/Raw Full Remote Tag/
File name	raw_full_remote_tag_YYYY-MM-DD.csv	raw_full_remote_tag_YYYY-MM-DD.parquet

Remote Tag Table

Field Name	Data Type	Description
hash	string/varchar	The unique identifier for job records. This is used to join to job record data.
remote_status	boolean	TRUE = The job title, description, or a structured field on the job listing contains keywords or phrases that indicate the role, in some way, is a remote position. This includes hybrid, telecommute, or work from home roles. FALSE = The job title, description, or a structured field on the job listing either contains keywords that identify the job as a non-remote position, or contains no keywords/language in the description that indicates the role is remote, work from home, or hybrid capable.
remote_detail	string	The classification of Hybrid or Remote, where applicable.
start_date	date	The start date for this row. This is used for joining point-in-time information.
end_date	date	The end date for this row. A NULL end date indicates that the row is the current or most recent remote_status.

Company ISIN* Reference

This file shows point-in-time ISIN (International Securities Identification Number), mapped via FactSet concordance. Please note, a license is required to receive this file.

Daily File	CSV	Parquet
Location	/Feeds/Company ISIN Reference/	/Feeds/Parquet/Company ISIN Reference/
File Name	company_isin_YYYY-MM-DD.csv	company_isin_YYYY-MM-DD.parquet

Company ISIN Reference Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
start_date	date	The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join.
end_date	date	The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join.
isin	string/varchar	International Securities Identification Number, mapped using FactSet
primary_flag	boolean	If True, this is the primary ISIN for the company_id.

Company CUSIP* Reference

This file shows point-in-time CUSIP (Committee on Uniform Securities Identification Procedures), mapped via FactSet concordance. This identifier is primarily used for publicly traded organizations in the United States. Please note, a license is required to receive this file.

Daily File	CSV	Parquet
Location	/Feeds/Company CUSIP Reference/	/Feeds/Parquet/Company CUSIP Reference/
File Name	company_cusip_YYYY-MM-DD.csv	company_cusip_YYYY-MM-DD.parquet

Company CUSIP Reference Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
start_date	date	The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join.
end_date	date	The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join.
cusip	string/varchar	Committee on Uniform Securities Identification procedures number, mapped via FactSet.
primary_flag	boolean	If True, this is the primary CUSIP for the company_id.

Company Sedol Reference

This file shows point-in-time SEDOL (Stock Exchange Daily Official List), mapped via FactSet concordance. This identifier is managed by the London Stock Exchange. Please note a license is required to receive this file.

Daily File	CSV	Parquet
Location	/Feeds/Company Sedol Reference/	/Feeds/Parquet/Company Sedol Reference/
File Name	company_sedol_YYYY-MM-DD.csv	company_sedol_YYYY-MM-DD.parquet

Company SEDOL Reference Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
start_date	date	The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join.
end_date	date	The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join.
sedol	string/varchar	Stock exchange daily official list number, mapped via FactSet.
primary_flag	boolean	If True, this is the primary SEDOL for the company_id.

PIT Company Reference

This file shows point-in-time company information for a company_id. This can be joined to job records to understand some corporate change (e.g, corporate name change, url changes, etc.). This would be joined to job records or aggregates to be used, dropping any company ids with no job records.

Daily File	CSV	Parquet
Location	/Feeds/Raw Daily PIT Company Reference/	/Feeds/Parquet/Raw Daily PIT Company Reference/
File Name	raw_pit_company_reference_daily_YYYY-MM-DD.csv	raw_pit_company_reference_daily_YYYY-MM-DD.parquet
Monthly File	CSV	Parquet
Location	/Feeds/Raw Full PIT Company Reference/	/Feeds/Parquet/Raw Full PIT Company Reference/
File Name	raw_pit_company_reference_full_YYYY-MM-DD.csv	raw_pit_company_reference_full_YYYY-MM-DD.parquet

PIT Company Reference Table

Field Name	Data Type	Description
company_id	string/varchar	The unique identifier for a company-scrape. This is used to join to reference files or aggregates.
start_date	date	The start date for this row. This is used for joining point-in-time information. Please see the join tutorial for assistance with join.
end_date	date	The end date for this row. This is used for joining point-in-time information. Please see join tutorial for assistance with join.
company_name	string/varchar	The name for the company_id.
company_url	string/varchar	The URL for the company_id.
lei	string/varchar	Legal Entity Identifier
open_perm_id	string/varchar	Open source company identifier used for joining to other data.
naics_code	string/varchar	Industry classification. This can be used to join to the Bureau of Labor Statistics salary data.

O*Net-SOC Taxonomy 2019 Reference

This file provides ONET-SOC code by job_hash, currently available in 2010 and 2019 taxonomy which supports databases 25.1 through the latest release, 26.1. ONET is the primary source for standardized occupation information in the US for over 1,000 occupations covering the entire US Economy.

NOTE: 2010 ONet data is available up to October 1st, 2022.

Daily File	CSV	Parquet
Location	/Feeds/ONet Taxonomy 2019 Daily v2/	/Feeds/Parquet/ONet Taxonomy 2019 Daily v2/
File Name	onet_taxonomy_2019_daily_v2_YYYY-MM-DD.csv	onet_taxonomy_2019_daily_YYYY-MM-DD.csv
Monthly File	CSV	Parquet
Location	/Feeds/ONet Taxonomy 2019 Full v2/	/Feeds/Parquet/ONet Taxonomy 2019 Full v2/
File Name	onet_taxonomy_2019_full_v2_YYYY-MM-DD.csv	onet_taxonomy_2019_full_v2_YYYY-MM-DD-part-###.parquet

ONET Taxonomy Table

Field Name	Data Type	Description
hash	string/varchar	The unique identifier for job records. This is used to join job descriptions to job records.
onet_occupation_code	string	The ONET classification of the job. It is one way of normalizing job titles.