Do you like extra traffic? Do you want visitors to find your blog via Google? If so, read on…..
Google operates what is known as the ’supplemental index’, which is where all the indexed pages not good enough for the primary index go. Google will tell you that supplemental means just that and that it’s no big deal. SEO (Search Engine Optimisation) types will say that it’s a Bad Thing to have more than a handful of pages in this supplemental index, and that it can have a knock-on effect on your site in general, and will reduce Page Rank and search engine traffic. Nathan at NotSoBoringLife.com said his search engine traffic increased by 20% when he escaped supplemental hell…
Are You In Supplemental Hell?
Easy way to find out, go to Google, and type the following search query in, obviously with your url substituted in:
site:YOURURL *** -view
This will tell you which pages are in S.H. and how many (in the top right): Blog-Op currently has around 550 pages, most of which are feed, archive or category pages, as yours will probably be.
What Are All These Pages?
WordPress, along with some other blogging platforms, typically repeats itself. The text in a post can appear;
- On your front page: yourblog.com
- On it’s single post page: yourblog.com/post
- On a category page: yourblog.com/postcategory
- On an archive page: yourblog.com/11/2006
- In the feed: yourblog.com/feed/post
And can also appear elsewhere due to trackbacks etc.
That’s at least five pieces of duplicate content, and what does Google not like, and will ‘punish’ blogs for? Duplicate content.
Escape From S.H.
So how to pull yourself out of S.H.? Upload a Robots.txt file to the root of your blog. This file is simply a set of instructions to the Googlebot and other web crawlers about what they can and can’t look at in your website. When they come looking at your site in order to index you, they will first look for instructions from your Robots.txt file, then read your sitemap and then index your site. Without a Robots.txt file, they will quite happily index everything in sight, and phone it home, whereby the Google algorithms will then punish you for the duplicate content.
Unless your Robots.txt file tells them exactly where to go, and what they can index.
You can use a Robots.txt to tell the Googlebot to just read one selected source of information for your posts, and to then ignore the others, along with all of your WordPress files, and other stuff it just doesn’t need to see.
Creating a Robots.txt file
Easy. Open a blank document. Look at my Robots.txt, which is the one I obtained from NotSoBoringLife.com, select all and paste it into your document.
Amend the top line, to give the address of your sitemap. If you don’t have one, see why sitemaps are essential, and create one.
Under the line “# Disallow all directories and files within ” edit the addresses as appropriate for your blog.
Amend the archive dates if you have posts in 2005 or further back. Feel free to amend anything else if you know what it means.
Save the file as ‘Robots.txt‘ (not RTF or DOC etc.) and upload to the root directory of your blog.
Check back in a day or so at Google Webmaster Tools and see if Google has picked it up. Give it a while and see what effect it has on your blog, and how many pages are in Supplemental hell.
The Challenge
I don’t know how effective this is going to be, so if you’re willing, let me know how many supplemental pages your blog currently has. Upload your Robots.txt, and let’s check back in a month and see what the figure is then.
Problogger has 6 pages in the Supplemental index. John Chow has over 1700. Let’s see if we can be more Darren Rowse than JC
—
Once you are on the web the most important thing is to attract traffic to your website. For this purpose search engine optimization is used. Other ways of getting a higher ranking in search engine results isdirectory submission and ppc affiliate programs. For example you want to promote computer software for a wireless camera; in this case you can make some photography websites as your affiliate in promoting your website. Microsoft certification like70-291 can be made an affiliate as well as they mostly have a very high traffic.





17 users commented in " Use Robots.txt To Escape Supplemental Hell "
Follow-up comment rss or Leave a TrackbackI have loads of duplicate content
I even have so much duplicate content that Google decided that some of it warrents being given a grey box, oh but then the same happened on Marketing Pilgrim and loads of other sites.
That command actually reports something different using Aaron’s SEO for Firefox, even though I think it uses effectively the same.
site:andybeard.eu *** -adghasdtrb
Supplemental Results and Toolbar pagerank are broken as hell and removing duplicate content pages whilst it might help a little with pagerank if your pages leak PR like a sieve, also give you problems with indexing and relevance.
I have duplicate content pages that rank highly for competitive money terms.
“I even have so much duplicate content that Google decided that some of it warrents being given a grey box”
Thanks Andy. Damned if you do, damned if you don’t? One day I may understand all of this….
Chris, Andy,
What are your thoughts on the duplicate content plugin for WordPress?
http://www.seologs.com/wordpress-duplicate-content-cure/
Andy will probably give you the more helpful answer, but as I understand it, it adds nofollow to sections like your categories and archives, but doesn’t exclude all the other guff residing on your server the way a robots.txt will.
I haven’t used it though, it’s just what I’ve read.
Well that site in SEO for Firefox shows
668 Cached 392 Supplemental
They have a 2 year old domain, with 2 years of content
I have
6950 pages cached and earlier today 7 pages supplemental
That has just this moment changed to 3480 Supplemental
I actually haven’t worked out how many pages I should really have… maybe 20K with all language variations if Google indexed all translated tags, which might happen eventually, or not, or maybe they just won’t show it.
How many sites talk about Wordpress Htacces? 100s?
I rank 3rd for the term wordpress htaccess, and I haven’t had a lot of links and only 2 Diggs.
Then again I rank well for dofollow and not for nofollow – I need to optimize that page better.
I have blogs set up both ways, even with some fairly extreme SEO, and ultimately I don’t think it makes such a huge difference, though whichever route you take, you should maximise your SEO efforts in that direction.
I strongly prefer noindex follow over blocking things like categories in robots.txt, but I have everything being indexed on andybeard.eu
I have 163 pages cached with 160 of them in supplemental. My math may be a little rusty, but that isn’t a very good percentage for my SEO is it?
Chris,
Thanks for the info. What I’d love to know is how, if at all, the guff makes a difference. It’s pretty interesting to read Andy’s take on the matter.
Andy,
You’ve built a strong case for me to leave my site ‘as is’ on this basis.
After just a short while experimenting with SEO for keywords I’m seeing results, so at least I’m heading in the right direction.
Altering the robots.txt file may very well be the way to go, I’m really not sure, but great to get a discussion on it here.
Wow that is a great way of finding your supplemental. I hadn’t heard of that until now.
I have 293 cached with 182 supplemental. Need to better that a little bit
However I noticed one thing. I started using the wordpress plugin all in one SEO pack about a week ago, and I couldn’t find any of the posts I’ve made since then in the supplemental results
Link – http://wp.uberdose.com/2007/03/24/all-in-one-seo-pack/
To be fair it isn’t a large sample size, but the plugin does have an option to turn off indexing for categories and archives, so that seems to be working.
Any of you guys experience something similar?
[...] Supplemental Results Hell – Are your pages appearing on Google’s main search results? Or are they getting return as supplemental results that people won’t see unless they click “Similar Pages?” See how you can find out if you are in supplemental results hell, and what you can do about it. [...]
Doesn’t sound great Jason
Thanks Andy, I think I’ll just run this as an experiment, keep a close eye on it and see what happens. Expect a tearful post in a months time that Google doesn’t love me anymore.
I’ve only just kicked this off Scott, so I’ll let you know how Robots.txt works for me. Plugin sounds useful though.
You could well be right David-it’s not going to kill you for now to leave as is, and see how things run. It may benefit (or at least not hurt) to have a simple robots files just excluding your WP engine files and other stuff on your server, leaves the robots free to concentrate on the content.
I was just going through my cached, site, and link for with and without the www as the prefix. A lot of the supplemental is duplicate as the www for the same page as without the www. I think it takes a while for a lot of them to disappear after you use the .htaccess 301 Permanent Redirect. 30 days doesn’t appear to be long enough. Maybe 6 months?
Whoops, I may have been mistaken about that plugin.
I had 182 supplemental results yesterday, now i’m up to 186. So it might not be doing all I had hoped. Let me know how the robots.txt file works, if it works better I’ll implement that.
I just edited my non-www site to 301 redirect to my www site, so hopefully that has a positive effect on my search engine rankings
Hey RT, as long as this experiment doesn’t make me Pr0 I’ll keep it up for a few months
The 301 should work Scott: If you run the 2 versions of your address through a pagerank predictor, you may find two wildly different predictions: The 301 should cure this in time.
When I first found out about this, http://www.blog-op was PR2, while blog-op was PR4.
Chris, please put the word out. My website is gone until Host9 gets my domain back online. It expired at the old registrar and is gone. The transfer, that I ordered, paid for, and confirmed, did not take place as I had thought.
No problem RT, it’s up.
I hope someone will have a brainwave about what you can do-I know you don’t want to pay twice, but I see Dotster take Paypal-is that an option? (I seem to remember you saying you had access now?)
[...] been about a month since I posted about Robots.txt and how it can help you escape supplemental hell, so I thought it was time for a quick [...]
[...] to Schanul and Richard Knight , Benedict , Homemom , Goldy , Chris for all the comments on previous [...]
Leave A Reply