How to See If a Page or Document on Your Website Has Been Indexed by Google

If you would like to find out if a page or document on your website has been indexed by Google, use the query modifier “site:url” like you would to see how many pages on your site have been indexed (and which ones), but instead of using “site:mcbuzz.com”, for example, use “site:mcbuzz.com/wordpress/what-is-wordpress” or whatever the entire URL or file name is that you want to check.

Click here » to learn about Search Engine Optimization (SEO) Services offered by McBuzz.

In other words, say I have a PDF on my site called “mcbuzz-wordpress-tutorials.pdf” (which I don’t – this is just an example). I can do a search using Google for “site:mcbuzz.com/pdf/mcbuzz-wordpress-tutorials.pdf” and Google will tell me whether is has this file in its index or not. Remember to use the entire path or URL for the page or document. If you keep your PDFs in a directory on your site called “pdf”, then you need to include that in the URL as shown in this example. (If you have quesitons about this, send me a comment.)

This post follows along the same lines as an earlier one called “Are PDF Files Indexed by Google?” But I also wanted to talk about this topic for a couple of reasons related to Search Engine Optimization and WordPress.

1. In WordPress, it is possible to specify the URL of a page or post — independent of the title you give the page or post — using the “Page Slug” / “Post Slug” feature. If you don’t specify a slug, WordPress will make one automatically using “Permalinks“. I told WordPress to give this post the URL “mcbuzz.com/2008/document-webpage-indexed-by-google”. If I hadn’t done so, WordPress would have called it “mcbuzz.com/2008/how-to-see-if-a-page-or-document-on-your-website-has-been-indexed-by-google”. Shorter is better as long as the relevant keywords are included in a URL, so I made it shorter by tweaking it a bit and removing words I don’t think are as relevant for SEO as the ones I kept.

2. Google is indexing pages and posts very quickly these days, sometimes in under an hour. The post you are reading right now was indexed in less than 7 minutes. If you have a URL indexed by Google, you may not want to change it because if you change it, the link to the page that’s in Google’s index will be broken. Someone might find your page or post by doing a Google search, but when they click on the listing, they will get a “Page not found” error from your site.

So, if you want to use the Page Slug / Post Slug feature in WordPress to customize your URLs, do so before or shortly after you publish a page or post. If you are thinking of changing a URL, you can check to see if your page has already been indexed before you change it.

If it has been indexed, you need to weigh the possible long-term SEO benefit of changing the URL so that it is more likely to show up on page 1 or 2 of Google for your target keywords — because Google will eventually re-index it. But if it has been indexed already and you want people to find it for some searches right away (in the next week or two, say) then you are probably better off leaving well enough alone.

Here’s an update to this post. News flash: That last paragraph applies only to WordPress.com-hosted websites and blogs. If your site is hosted by a third party rather than WordPress.com, and you are comfortable enough with WordPress to be able to download, install and activate a WordPress plugin (or you know someone who can help you do so), then you don’t need to worry about whether a post or page has already been indexed by Google or not. You can use a WordPress plugin called “Redirection” to redirect someone to the new URL when they request your page or post using the old URL.

In other words, say you create a post called My New Post with the URL http://www.example.com/my-new-post/. It gets indexed by Google in 30 minutes or whatever. Then you realize, Oops!, I should have named that post My New Post About WordPress, because it’s about WordPress! And you really should include “wordpress” in the URL to make the URL more search engine friendly, i.e., to let search engines know that the post really is about WordPress. One of the absolute best ways to do that is to put your keyword — in this case “wordpress” — in the URL. So go ahead, rename your post and either create a new post slug yourself or let WordPress do it for you.

Now your new URL can be http://www.example.com/my-new-post-about-wordpress/ (or whatever you want to make it using the Page Slug /Post Slug feature in the editing window). If someone finds your post using Google, and Google is still using the old URL, that person will click on the link and when their web browser asks your host’s server for the page at http://www.example.com/my-new-post/ , the server will know that they really want the page at the new URL http://www.example.com/my-new-post-about-wordpress/ and it will redirect them there. The fact that you changed the post title and the URL will not keep people from being able to find the page. Pretty cool.

Now for this to work, you have to know how to install the Redirection plugin, and how to set it so that it does what you want. And you also have to be using permalinks. (Read more about WordPress permalinks here.) I just installed the plugin on mcbuzz.com, at it’s one of the easier plugins to use. Just follow the directions in the readme.txt file that comes with the plugin. You can set it to create redirections automatically when a post slug changes, or you can do it yourself manually when a post or page slug changes.

Confused? Just send me a comment using the form below.

Comments: 3

Are PDF Files Indexed by Google?

Yes. PDF files are indexed by Google and other search engines.

Following up on a question from a reader (Optimize PDF Files For Websites and Search Engines), as I note there, one way to see if a PDF on your website has been indexed by Google is to copy a long line of text from the PDF, and then put it into the Google search box — with double quotes on either end. You can do this to find any particular document or page available on the Web — as long as it has been indexed (scanned or “spidered” and catalogued) by Google.

For example, if you Google “Enter an estimate of your 2008 nonwage income (such as dividends or interest)” with the double quotes on either end, Google offers you a link to a PDF of IRS Form W-4 for 2008. This shows that the 2008 W-4 PDF document has been indexed. (Incidentally, Google also offers you a link to the mcbuzz.com page you are reading right now since it contains the same string of text.)

Another way to see if a PDF has been indexed by Google is to use the “site:url” query modifier. This is a handy trick when you want to narrow your search to one domain. If I Google [site:mcbuzz.com] – without the brackets, Google lists every page in my site that has been indexed. If I Google [site:mcbuzz.com web] – without the brackets, Google lists every page in my site that contains the word “web”. And, as a helpful reader points out below, you can Google [site:mcbuzz.com filetype:pdf] – without the brackets, to see if there are any PDF files on the mcbuzz.com website that have been indexed by Google. (Be sure not to put a space between “filetype:” and “pdf”.)

I don’t have any PDFs on my site. Try it with another domain to see an actual positive result. To see if a particular PDF on my website has been indexed, I can Google [site:mcbuzz.com “some word or phrase in the PDF”] – without the brackets. Of course, you can also Google [site:mcbuzz.com myfilename.pdf] to do the same.

Returning to the Form W-4 example, Google [site:irs.gov “Enter an estimate of your 2008 nonwage income (such as dividends or interest)”] and Google lists one and only one result: the PDF on the IRS website.

Something fairly amazing: Google knows every phrase in that PDF and in any other document or web page it has indexed. That’s a lot of information.

The other question in the mcbuzz.com post mentioned above was whether a PDF would be indexed if it were encrypted or had other security settings applied to it. If you have Adobe Acrobat 8 Professional, you can answer this question yourself.

Open a PDF and then open the Document Properties dialog box (File > Properties…). Click on the Security tab, and you see the various security options. There are different security options depending on which version of Acrobat Reader you want your PDF to be compatible with. The dialog box tells you if search engines will be able to read all or only some parts of the PDF (e.g. metatags or attachments) when you select the various options. If the PDF can’t be read by search engines, it won’t be indexed.

For those interested, here is more information about Google query modifiers like “site:url”.

Click here » if you would like to find out about Search Engine Optimization (SEO) Services offered by McBuzz.

Comments: 8