Not infrequently, we are asked why the number of jobs for a company in LinkUp Data does not match the number of jobs on a company's website. There are a great number of factors that can result in a dissimilitude of anywhere from a few jobs to many. The most common of these reasons you will find below.
The "Point in time" Nature of Data
The most obvious reason for a job count difference is that the scrape captures what the data was at an earlier point in time than what you are currently looking at. From the moment one of our scrapes finds information on a website, the information has to be parsed, stored, included in file generation, and delivered. Combined with the fact that scrapes typically only run every 24-48 hours (see Scrape Runs: Frequency, Duration, and Timing for more on this), there is plenty of time in between when we capture the data and when you observe it for the company to add and/or remove job listings, thereby changing the total count.
Multiple Vacancies Per Job Listing
Usually, when a company's jobs portal shows the number of jobs to be found (e.g. Viewing 1-50 of 2,356 jobs), what it's actually representative of is the number of job listings and not job vacancies. Commonly, we'll see a single listing actually represents multiple vacancies, usually in terms of multiple locations the vacancy applies to. In these cases and where possible, we treat each individual vacancy/location as a unique job record. As such, our job count will be higher than the company's website because we're counting vacancies, not just listings.
Multiple Job Portals
Especially with larger, international companies, job listings may not all be collected in a single job portal. It's not uncommon to see jobs separated based on global region, department, skill level, or any number of other categorizations. These other portals might also be located on a part of the company's website that's not an obvious spot to find additional job listings. Wherever possible and reasonable, we try to capture all of these different job portals under a single scrape/company_id. If you notice a discrepancy in job count, it could be that what you're looking at on the website is only the North American jobs—even if that's not obviously stated anywhere—whereas our scrape is collecting all the jobs globally.
The Company's Job Portal Reports A False Count/Has Bugged Functionality
It may be strange to consider, but sometimes LinkUp Data is a better source of truth than the company's own site when it comes to the number of job listings. It is far from unheard of to come across a job portal that will say it has xyz-number of jobs, but upon a closer look is either over- or under-reporting the number of listings actually available. There could be duplicate listings where the same job URL shows up multiple times. The "jobs per page" might change from one page to the next, throwing off the accuracy of the overall count. Some Applicant Tracking Systems (ATS) stop counting jobs beyond a certain threshold. Some of the jobs may actually turn out to be expired if you click on them. These reasons and more can result in us reporting a different number than what the actual website shows.
There Are Jobs We Won't Capture
On occasion, there are jobs we deliberately skip when scraping a job portal. "Test" listings, "general application" listings, and jobs that explicitly state they're closed and/or have expired are all examples of listings we tend to pass over when scraping.
There Are Jobs We Can't Capture
Consistency is key when the collection of jobs data relies so heavily on pattern-matching. When a company keeps using different formatting for things like locations and job descriptions (particularly where "WYSIWYG" solutions are being employed), it can make consistently capturing job information difficult. Other examples of things we can't capture include—but are not limited to—jobs with "external" descriptions (e.g. embedded PDFs, Word Docs, Google Sheets, Flash animations), difficult to parse locations and/or remote/work from home jobs, and some jobs whose URLs and/or information contain uncommon character sets.
The Careers Page or Applicant Tracking System May Be Unstable
Some ATS's are simply volatile by nature. While we take every step we possibly can to overcome this, sometimes a careers portal is simply too buggy and/or unstable to scrape effectively. A particular example are those ATS's that will seemingly randomly cut-out or experience a server error, leaving us unable to finish parsing all the jobs we had queued.
The Scrape Might Need an Update/Maintenance
Certainly, the issue may simply be that the scrape is no longer capturing things as accurately as it once did. The nature of the beast when it comes to scraping is that any change made by an employer or ATS can throw off the scrape we had set up. With tens of thousands of scrapes in our system, we are perpetually in a state of reviewing, repairing, and rebuilding existing scrapes in addition to the new ones we add every day. It's never a question of if a scrape will break; it's only a question of when. As such, if you've identified a company of particular importance to you whose scrape you believe is in need of updating, feel free to notify the support team and we'll work to get that scrape prioritized. That being said, if your list of "important" companies is extensive, it will quickly reach the point where it becomes indistinguishable from our original prerogative of "fix everything", so it helps to be mindful of that.