Solving the canonical problem
In February of this year, Microsoft, Yahoo and Google jointly announced their support for a new tag you can put in your page to tell the bots what the true canonical URL of that page is. What is a canonical URL and why is it important the engines have come up with a solution? Fair questions. Let's explore the answers.
First off, a canonical URL is the correct and original version of any particular Web document. Due to tracking parameters, SessionIDS, server configuration and other factors, a Web page can exist under several different URLs. Consider the following URLS. They are all different but can easily all point at the same content:
- www.example.com
- example.com
-
www.example.com/
- example.com/
- www.example.com/index.html
- example.com/index.html
- www.example.com/Home.aspx
- example.com/Home.aspx
That list above can grow much bigger if you add in Sessionids and other tracking parameters. Why is this a problem? We all know that search engines are basing their search results very heavily on inbound links these days. Having multiple versions of a Web page can result in the power of your inbound links getting split up because people are finding and linking to different versions. If all those links were pointing to one version it would have more ranking potential.
Secondly, the search engines have been very clear that they don't have infinite crawling resources so they use various signals to determine how often and how deeply to crawl Web sites. To ensure optimal indexing, you want to be sure that the bots are crawling unique content and not the same content with slightly different URLS.
So how does a search engine decide which of those to use if they are all resolving? It's hard to say for sure, but why let them decide in the first place? Got tracking parameters in the url? Move them to a cookie. Got Sessionids in a url? Move them to a cookie. www and non-www version of the site? You can 301 redirect one version to the other. Sounds easy when I write it in a few sentences like that, but the reality is that you can't always do those things. Content-management systems and e-commerce platforms can often be big investments, and implementing certain changes is not easy, and sometimes not even possible from a fiscal or technological point of view.
Enter the shiny new rel="canonical" tag. Very simply put, this is a new HTML tag you can use to tell the search engines what the canonical URL of any given page is. For example, if your URL is www.example.com/product.asp?product=459&trackid=some_media_code, you can tell the engines that page is really just www.example.com/product.asp?product=459 by adding this code to the page: .
Essentially what you are creating is like a mini 301 redirect that will consolidate duplicate content and the associated inbound links. You don't have to get bogged down at the server level, or try and track down all instances. You can just drop this in the main template and it will take care of all the tracking parameters and SessionIDS you can throw at it.
Before you black hat ... er ... more aggressive seos get too hot and bothered about the potential, you need to recognize a few rules. This only works on a single site. You cannot redirect across domains - only across subdomains and subfolders of the root domain. This tag can also only be used on identical or extremely similar content, so don't think you can use it to flow link juice from popular pages to less popular pages with impunity. Also, the engines always reserve the right to implement this tag as they see fit. They will not blindly follow these redirects. Like anything else in SEO, abuse will bring consequences.