Commentary

Web U: Double Double Content Trouble

How not to be duped by the content bogeyman

The major search engines have done a great job of creating and crafting the concept of the Duplicate Content Bogeyman. Over the past several years of participating in online forums, blogs, and live speaking engagements, I can safely say that duplicate content penalties are one of the top concerns of many Web site owners and developers (second only to, "How can I be No. 1 on Google for Viagra? I hear those guys make lots of money"). 

The common belief is that Google and Yahoo have highly sophisticated duplicate filters that not only delete duplicates, but smack down ranking penalties on the offending Web sites. That could not be farther from the truth. While it's possible to find instances where a search engine is ranking a stolen copy of some content above the original version, that's not because of any penalty. It's really just a result of the fact that the second site may have been spidered first or has a better incoming link structure than the original content.

At the end of the day, there is not much to be done about offsite duplication, which results either from content theft or legitimate content syndication. But don't worry  it's not the end of the world. In fact, it can work to a savvy Web site owner's advantage. Content theft is generally contrived by Webmasters who are too lazy to create their own content or work for their paychecks. They run spiders or cut-and-paste content by hand from one site to another.

If the Web site being copied embeds URLs in all articles, resource pages, directory pages, and so forth, then when the content is republished elsewhere, the new site will include links pointing to the original site. It's a bit like stealing jeans from a department store and having the ink tag blow up and destroy them. The content thief's work will actually "blow up" in their face, as the original site gains the benefit and further outranks the duplicate.

Content syndication can work the same way. Never push content through feeds or other channels without including a link back to the source. Another trick with content syndication is to delay, whenever possible, the distribution of content until it's been verified that the search engine spiders have crawled the original version.

Internal duplication can be a much scarier beast. Since a Web site is under full control of the owner, there should, theoretically at least, be no excuse for duplication  right? One would think so. Enter the clever Web developer...

Consider the following set of URLs:

www.example.com/cat1/subcat2/prod3

www.example.com/subcat2/cat1/prod3

www.example.com/prod3/cat1/subcat2

www.example.com/subcat2/prod3/cat1

And so on.

It's easy to set up a server to return the same page for all variations of the URL structure. Most likely, the first version of the URL is the default structure, which makes sense based on hierarchy. But if someone were to set up a link to a featured product using one of the other versions, there would be another whole Web site for a search engine to crawl. The URL structure can cascade through the site dynamically, generating a new site based on the incorrect URL structure. Compound this error a few more times, and now the search engines have multiple versions of a Web site. Thanks a lot, Mr. Clever Web Developer.

How can this potential damage be reined in? Two words: Permanent Redirects. Choose a URL structure to serve as the default. Configure the server to force a Permanent Redirect from the non-standard URL to the standard URL. This will do two very important things. First, it will begin to eliminate duplicate content. Second, it will capture and consolidate any inbound links to those non-standard URLs.

Just in case that last bit sounds rather like gibberish to the less technically inclined, there is a simple solution. Find a geek and bribe him with beer. That should work out quite nicely for everyone involved. 

Next story loading loading..