WordPress is great, it really is, however left unattended for too long you can get your site in a a whole heap of trouble with the Google Panda algorithm. The Panda algorithm does not like auto generated pages which offer users nothing but thin content. So now we ask what is thin content?.
Thin content is basically a page or post with little or no content, an image, a line of text, a page with a single link. Add to this duplicate content and you will find that your sites rankings tumble…like Techieshelp.com`s did. It has been a very steep learning curve but in the post below I will go through the steps taken to tidy up your WordPress driven site, remove thin and duplicate content by setting up WordPress correctly and also remove the content from the Google index.
Bare in mind that even when these steps have been took, it may take a few months to fully recover. Currently the Google Panda algorithm runs 10 of the 30 days in a month.
Prerequisites
We need two things in place before we continue, a Google Webmaster tools account and Yoasts WordPress SEO plugin. You may already use it and if you don’t you should. If you use another tool then adapt the changes on the plugin you use to match the settings I use. Get your Google account of WordPress SEO in the links below. Sign up to the Google account ( we use this to remove the URLS that we class as thin ) then install the SEO plugin ( we use this to manage tags, category’s, attachment pages etc.
Tags, Category`s and Attachments
Ok first a little test, go to google and enter the following, replacing the domain name with your website name. I will use techieshelp.com as the example here.
site:techieshelp.com
You will see how many pages Google has indexed for your website. Then scroll through them all and you will most likely find there are a lot more than your thought you had, once you get to the end you may find that google gives the following message.
If you like, you can repeat the search with the omitted results included.
This is the first warning that Google has found duplicate content on your site.
First we need to make a decisions, do we want Google to index either tags or categories?. When google spiders a page it wil hit a category page and index the content of it, it will also do the same if it hits a tag, posts that are linked to it will be index – voila duplicate content. The best thing to do is only allow either tags or categories to be indexed. Here at Techieshel we do not index tags, we use them as purely navigational for users and not Google. We allow category`s to be indexed. Now that you have the WordPress SEO plugin install configure the following settings.
Browse to
SEO > Titles and Metas > Taxonomies.
Then set your desired choice. As you can see here we are indexing categories but not tags. Then save your settings.
Now that’s set , when Google spiders a page it will follow the tags to get to other pages however it will not index the content. While browsing through your results earlier you may have noticed a lot of attachment pages. These pages are automatically generated when you attach images to posts. On image sites this may be fine, however for most sites this will cause you problems. An attachment page is simply the image on its own page. Extremely thin content.
Luckily withing the WordPress SEO we can 301 the attachment page to the main article page. So now you ask what is a 301 redirect? A 301 redirect tells Google that this page has gone but the information can be found here. Additionally a 301 also tells Google to remove this page from the index there fore we get the double effect of passing all the page rank to the main article while removing the thin content. To do so go to the following section in the SEO plugin then set the following settings in the image.
Permalinks >
We now have all our permalinks settings setup correctly and we have also decided what to index we can now look at Trackbacks.
Removing WordPress Trackbacks From Google
Whenever anyone who uses WordPress links to one of your pages or posts then a track back is created. Lets say you have a page on pizza a track back URL would look like this.
www.domain/pizza/trackback/
When this is created WordPress uses a 302 ( temporary redirect ) which means Google not only indexes the track back but it never removes the trackback page. We need to change this default behavior to be a 301 redirect so that all the page juice flows nicely and the trackback pages is eventually removed from googles index. To do the following using your editor of choice.
Find the wordpress file titled wp-trackback.php and open it.
wp-trackback.php
Within the file locate the following txt,
wp_redirect(get_permalink($tb_id));
Then replace it with this code so we tell WordPress and Google to 301 the page.
wp_redirect(get_permalink($tb_id),301);
This will take a few weeks or more but eventually google will remove the Trackback pages from its index, more think content removed.
Removing Pages From Google’s Index
To speed up matters we will now remove some pages from the index. earlier we decided to remove tags and keep categories, lets remove the tags from the index. Log into Google webmaster tools and navigate to the following page.
Google Index > Remove URLS
Select create a new removal request and enter your domain and tags folder as seen below.
www.domain.com/tags/
As seen here,
If you have the time then you can also remove the attachment pages that Google has indexed by entering the URLS here also. As mentioned all of these changes may take a while, maybes 3-4 months plus. The site will need to be re-crawled and also pages take time to be removed from the index.
WordPress Panda Recovery
As they say the proof is in the pudding, the initial yellow is when we were hit the second is when recovery started(please click to enlarge).
If you have any stories or experience of recovering from the Google panda algorithm then please let us know below.