Rae Hoffman

Googlebot Gone Wild on WordPress Backend Files

by Rae Hoffman on September 17, 2007 | SEO

I’ve been noticing a problem on several of the blogs I’m running lately in regards to their indexing at Google. Granted, I’m not a big one for running wordpress as most of my sites run on a custom built CMS system, but I’ve been running a few for quite a while and what I’m seeing lately in regards to their indexing seems odd to me.

The problem is that Googlebot has begun indexing the backend files of the blog. Some example urls I have begun finding indexed (and yes, I checked to make sure they had no inbound links and according to Yahoo, they don’t) are:

/wp-content/plugins/webprof-delicious/webprof-delicious2.php
/wp-content/cache/wp-cache-cdecddb6bcc0b95b96cdcb347.meta
/wp-content/cache/wp-cache-cdecddb6bcc0b9b4dcdcb3427.html
/wp-content/themes/default/

And all those urls have some great title and description tags like:

Fatal error: Call to undefined function: get_header() in /home …
Fatal error: Call to undefined function: get_header() in /home/username/html/wp-content/themes/default/index.php on line 1.

Warning: main(ABSPATHWPINC/rss-functions.php) [function.main ...
Warning: main(ABSPATHWPINC/rss-functions.php) [function.main]: failed to open stream: No such file or directory in …

The indexed cache pages of course have dupe titles, descriptions *and* content as their non cached counterparts. But, hey, there is no duplicate content penalty right? [sarcasm]if my pages can’t be found as a result of it, you can call it a penalty or a “spoonful of love” – it has the same meaning to me[/sarcasm]

1. What the hell is Google doing crawling these pages and files?

2. Anyone else seeing this (there are millions of pages indexed, so someone else must be) and if so, for how long have you been seeing it?

3. If by some chance, this was coming from toolbar data, wouldn’t they think to set up an exclusion for one of the largest blog platforms in existence to not have these pages indexed and wasting space?

4. I’m not seeing the toolbar as an explanation in all honesty because I’ve never visited some of these urls in the backend – they’re not even “navigatable” … so how the hell is Google “finding them” to begin with, considering the whole point of “crawling” for urls?

Subscribe to the Sugarrae feed | Follow Sugarrae on Twitter

Related Posts

Sugarrae runs on the Thesis WordPress Theme

Thesis WordPress theme

If you’re someone who doesn’t understand a lot of PHP, Thesis will give a ton of functionality that you wouldn’t be able to obtain otherwise with a simple control panel instead of having to alter code. For the advanced, Thesis has incredible customization possibilities via Thesis hooks.

For those "in between", like myself, I’ve created "dummy" guides for Thesis hooks that allow us to make more professional customizations than we ever deemed possible. The theme is not only highly customizable, but it has allowed me to run Sugarrae more professionally, with a much more targeted focus on monetization than it ever has been able to achieve before. You can find out more about Thesis below:



Previous post:

Next post: