Solving the canonical problem
In February of this year, Microsoft, Yahoo and Google jointly announced their support for a new tag you can put in your
page to tell the bots what the true canonical URL of that page is. What is a canonical URL and why is it important the engines have come up with a solution? Fair questions. Let's explore the
answers.
First off, a canonical URL is the correct and original version of any particular Web document. Due to tracking parameters, SessionIDS, server configuration and other factors, a Web
page can exist under several different URLs. Consider the following URLS. They are all different but can easily all point at the same content:
- www.example.com
- example.com
-
www.example.com/
- example.com/
- www.example.com/index.html
- example.com/index.html
- www.example.com/Home.aspx
- example.com/Home.aspx
That list above can grow much
bigger if you add in Sessionids and other tracking parameters. Why is this a problem? We all know that search engines are basing their search results very heavily on inbound links these days. Having
multiple versions of a Web page can result in the power of your inbound links getting split up because people are finding and linking to different versions. If all those links were pointing to one
version it would have more ranking potential.
Secondly, the search engines have been very clear that they don't have infinite crawling resources so they use various signals to determine how
often and how deeply to crawl Web sites. To ensure optimal indexing, you want to be sure that the bots are crawling unique content and not the same content with slightly different URLS.
So how
does a search engine decide which of those to use if they are all resolving? It's hard to say for sure, but why let them decide in the first place? Got tracking parameters in the url? Move them to a
cookie. Got Sessionids in a url? Move them to a cookie. www and non-www version of the site? You can 301 redirect one version to the other. Sounds easy when I write it in a few sentences like that,
but the reality is that you can't always do those things. Content-management systems and e-commerce platforms can often be big investments, and implementing certain changes is not easy, and sometimes
not even possible from a fiscal or technological point of view.
Enter the shiny new rel="canonical" tag. Very simply put, this is a new HTML tag you can use to tell the search engines what the
canonical URL of any given page is. For example, if your URL is www.example.com/product.asp?product=459&trackid=some_media_code, you can tell the engines that page is really just
www.example.com/product.asp?product=459 by adding this code to the page: .
Essentially what you are creating is like a
mini 301 redirect that will consolidate duplicate content and the associated inbound links. You don't have to get bogged down at the server level, or try and track down all instances. You can just
drop this in the main template and it will take care of all the tracking parameters and SessionIDS you can throw at it.
Before you black hat ... er ... more aggressive seos get too hot and
bothered about the potential, you need to recognize a few rules. This only works on a single site. You cannot redirect across domains - only across subdomains and subfolders of the root domain. This
tag can also only be used on identical or extremely similar content, so don't think you can use it to flow link juice from popular pages to less popular pages with impunity. Also, the engines always
reserve the right to implement this tag as they see fit. They will not blindly follow these redirects. Like anything else in SEO, abuse will bring consequences.