Name:
Location: Columbus, Ohio, United States

Monday, September 25, 2006

www and no-www domain names

Suppose you have a properly configured website. It has non-canonical (e.g. example.com) hostnames redirected to the canonical (e.g. www.example.com) web host. The internal hyperlinks are utterly consistent. In this case, no back-end canonical processing from the search engines is required. The spider only needs to "discover" URLs in the the website, once. It does not need to follow every possible linking path through the website.

Bar Graphs
Headlines
Pioneer
Tree MDI
Cruiser

Take a misconfigured website. Multiple hostnames directly resolve to content, without canonical redirects. Internal linking is inconsistent. Here, the spider has to traverse all linking paths. It has to maintain a count of various hostnames and page names used in linking, while comparing page contents along the way. Next, it has to use some sort of algorithm voting to determine "probable" canonical hostname. Once you add inconsistent backlinks from other websites, and it is a bad dream becoming a nightmare.

Cruiser
Furl Item
Furl
Links
Bookmarklets

Processing requirements from the two cases differ by several orders of magnitude. It is likely in the second case, multiple crawl cycles (possibly taking weeks or months) are required to determine a probable canonical hostname, even for a small website. For a website with 100,000 frequently changing product web pages and the process may never finish.

0 Comments:

Post a Comment

<< Home