Frequency: How Often Do Scrapes Run?
By default, a scrape runs once every 48 hours after the previous scrape run has completed. There are, however, a few specific exceptions to this rule.
- Scrapes for S&P 500 companies and their subsidiaries have a run frequency of every 24 hours
- A very small number of scrapes have their frequency lowered to 168 hours (one week) to avoid volatility issues
If a scrape does not fall under one of these four aforementioned exceptions, assume it has the default 48 hour frequency.
Duration: How Long Does It Take A Scrape To Run?
How long a scrape can take to run from start to finish varies astronomically and most of the factors are not something we can control. The following is a list of some—but not all—variables that can affect scrape run duration.
- How many jobs are there?
- Are the jobs embedded in the page itself or being loaded in via API?
- Are all the jobs on one page or multiple pages?
- If multiple pages, how many pages are there?
- If multiple pages, how many jobs are on each page?
- Are the jobs split between multiple "portals" based on industry/category/location?
- If so, how many different ways are the jobs split?
- If so, are the different portals using the same ATS/solution or different ones?
- Are we capturing the jobs with pattern matching or decoding another format such as XML or JSON?
- Is the scrape running Full each time or Delta?
- Does scraping the site require us to use a proxy?
- Are the jobs on the employers website itself or are they using an ATS?
- If they're using an ATS, which one are they using?
- Does the site have issues with stability or receiving lots of page requests in a short period of time?
- Is the site under maintenance?
- What's the turnover rate on the company's listings?
Timing: When Do Scrapes Run?
There isn't a specific time when any given scrape will run. Also, as can be extrapolated from the two previous sections, the precise time a scrape runs is likely to change from day to day. Given that the next scrape run starts a number of hours after the previous run completes and how long the scrape takes to run can be anywhere from a matter of seconds to multiple hours, it may vary wildly.