With Halloween approaching, we felt it was relevant to talk about spiders – Google spiders that is. The purpose of Google’s spiders is to regularly crawl through websites to rebuild their index. Crawls are based on multiple factors, such as PageRank, links to a page and crawling constraints. Any of these factors can affect how often your site is crawled through.
Having a crawl-friendly website is important because if Google’s spiders can’t access your web pages, you’ll have a hard time ranking well on the search results. Two of the most important signals that Google uses to determine if it should slow or stop crawling your site include connect time and HTTP status codes.
Connect Time and HTTP Status Codes
Connect time looks at how much time it takes for the GoogleBot to connect to your server and web page. If that time gets longer and longer, Google will back off and either slow or stop crawling your site pages. The reason for this is because Google doesn’t want to take down your server and interrupt the experience for your users.
A second major signal is HTTP status codes. Google will slow or stop crawling your site if they receive server status codes in the 5xx range. Anything in this range typically indicates that there are issues with the server responding, so Google steps back because they don’t want to cause any additional problems for your site.
Connect time and HTTP status codes aren’t the only signals that Google uses to determine its crawl rate. It also looks at other obvious factors such as disavow, robots.txt and nofollow tags.
When Will Google Return to Crawling?
Generally speaking, GoogleBot will return at a different time to crawl through your site if it experiences any of the above problems. However, if the issues persist on your end, Google won’t have the opportunity to move through your site and give it its deserved territory in the search results. And, if spam is detected, your site may no longer show up in the search results.
Like all of Google’s algorithms, the system that is used to crawl through sites is automatic. Therefore, you must ensure that your site is following the best practices to help Google find, crawl and index it. Many factors are involved, but be sure to pay close attention to your connect time and HTTP status codes. For more tips on creating a crawl-friendly website, see the Webmaster Guidelines.