Solving Duplicate Content Problems
October 23rd, 2008 by David Bradley >> 2 Comments
Recently, Google‘s webmaster trends analyst Susan Moskwa Matt Cutts waxed lyrical about how duplicate content on your website or blog is not actually the search engine optimization (SEO) no-no that many SEO managers think it is. There is no duplicate content penalty apparently. But, there is the penalization in terms of what is displayed in Google’s SERPs (search engine results pages) if you have duplicate content.
Now, it would seem obvious that you shouldn’t simply copy your own pages, with different web addresses (URLs) stuffed with keywords. That was such a 90s blackhat SEO technique and was quickly spotted by the search engines and labeled as spammy. Indeed, only those stuck in the 90s and wearing a cranial stovepipe would do such a thing deliberately. In case you are worried that you have done this inadvertently with similar, but not necessarily identical pages, try the duplicate content checker.
However, almost everyone who runs a WordPress blog does this just by linking to their archives pages, for instance. Duplicated content abounds on a WordPress site as post excerpts are automatically rehashed and copied for archives, categories and tag pages.
Having assumed I had a nice sturdy installation with a clarifying robots.txt file in place, I was shocked to discover (having followed the duplicate content query advice here) that searching for a long, unique string of words from any old post on Sciencetext.com was not giving me the results I hoped for. Instead of showing the post containing that unique string of text in Google, the search engine was displaying the category page, or worse, just the homepage.
This is not good, because if someone is searching for a particular keyphrase, then they will see Sciencetext in the SERPs but the page that Google will take them to will not be immediately relevant to their search, instead it will be a broad category/tag page, or, like I say, the homepage. Those visitors will likely show up as 5 second wonders who never return, and that’s not what you want. The problem boils down to Google not knowing which page to show – unique single post or category page.
So, what’s the solution? Well, armed with a little knowledge, I did a quick search for how to fix such WordPress duplicate content issues. A quite old post on how to make WordPress duplicate content safe is just as relevant today. In summary, it says:
- To avoid the duplicate content issue in WordPress include you should do:
- Add “noindex, follow” meta tag to your monthly/weekly/daily archives, “next entries“, and if necessary, category pages
- Ensure that all your pages have unique meta-description tags
- Set up 301 redirects for your non-www links and links without trailing slashes
- Restrict search engine crawlers from indexing your feeds and trackbacks
- Use more tag to show excerpts in your home page instead of full posts
- Restrict the number of posts displayed in your home page
The details on how to apply these different hacks are in the post, but fixing canonical problems (www or not www), ensuring robots.txt is in place, and adding the meta nonindex conditional code to your blog’s header are my essential three from their list. Indeed, the tweaks that I had overlooked on Sciencetext.com (although undertaken years ago on Sciencebase.com) were implemented on Nobel Physics day, October 7, 2008. By the time this post publishes I am hoping that Google will have seen the light and unpenalized what it perceives as duplication and so provide you with a much better service when you hit Sciencetext via the search engine.

"Deceived Wisdom: Why What You Thought Was Right Is Wrong" from David Bradley. Available now on 


Leave a comment ↓
James S // Feb 2, 2009 at 6:29 pm
In regards to the duplicate content part of this blog post, I personally use the http://www.copygator.com website to find and stop duplicate content:
1. it’s automated and brings me results instead of me searching for duplicated content. All i had to do was submit my feed and it started monitoring my feed showing me who’s republished my articles on the web.
2. i get notified by email so it contacts me when it finds copies of my articles online.
3. i use their image badge feature to alert me directly on my website when my content is being lifted.
4. it’s a free service as opposed the “per page” cost of copyscape/copysentry.
David Bradley // Feb 2, 2009 at 6:56 pm
I just did a quick test with copygator and it’s not seeing the correct feed for the site I tested but is picking up a legacy xml file that is 301 redirected, is there a way to correct it?