3 things you should know about crawler queuing times
Summary
Crawler queuing time depends on when a site enters the queue, the number of simultaneous crawls allowed on an account, and the availability of global crawl slots, which together determine how quickly a crawl can begin.
Overview
Before your site is crawled it is added to a queue of sites that are waiting their turn to be crawled. The amount of time it spends in the queue depends on several factors, including when it is added to the queue, how many simultaneous crawls your account supports, and how many crawl slots are available globally.
What is crawler queuing time?
Before your site is crawled it is added to a queue of sites that are waiting their turn to be crawled.
The amount of time it spends in the queue depends on the following :
- The time at which it's added to the queue
- The maximum simultaneous crawls allowed
- The number of crawl slots globally available
The time at which it's added to the queue
The specific time at which the site has been added to the crawl queue. Sites that have been in the queue longer than other sites will receive a crawl slot earlier.
Maximum simultaneous crawls
The number of crawl slots available on your account is called the Maximum simultaneous crawls. Your account has been configured with a certain number of maximum simultaneous crawls allowed.
By default, we will crawl a maximum of 2 sites simultaneously on an account. This is to ensure that we are not overloading your servers. Often multiple websites are hosted on the same server, therefore increasing the number of maximum simultaneous crawls will add requests and therefore the load on your server.
Each site is crawled with one request at a time and a delay of 200 milliseconds by default. Read more about "How do requests to my website affect crawl speed?"
If your servers can handle more simultaneous requests from our crawler without getting overloaded, then the number of simultaneously crawled sites on your account can be increased by Siteimprove Technical Support. Information on your server's capacities should be available from your hosting provider or IT department.
The number of crawl slots globally available.
If your account has an empty slot available (i.e. it has not reached maximum simultaneous crawls), a queued crawl will start as soon as there is a free crawl slot available globally. Siteimprove has thousands of simultaneous crawl slots available, allowing crawls to start soon after entering the queue.
Additional note
Why do my scans not show a queue time for previous scans?
We started tracking the queue time of scans on August 12th, 2020, and only for scans using the new Siteimprove Crawler. Read more about why the scan history might show "not available" for certain queue, crawl, or processing times.
Key takeaways
- Every site must wait in a queue before being crawled
- Queue time is primarily influenced by timing, account limits, and global capacity
- Increasing simultaneous crawls may improve speed but can impact server load
- Global crawl availability helps ensure most crawls start quickly
- Queue time tracking is only available for newer crawler scans
Did you find it helpful? Yes No
Send feedback