Are PDF Files Indexed by Google?

Jan 30, 2008

Yes. PDF files are indexed by Google and other search engines.

Following up on a question from a reader (Optimize PDF Files For Websites and Search Engines), as I note there, one way to see if a PDF on your website has been indexed by Google is to copy a long line of text from the PDF, and then put it into the Google search box — with double quotes on either end. You can do this to find any particular document or page available on the Web — as long as it has been indexed (scanned or “spidered” and catalogued) by Google.

For example, if you Google “Enter an estimate of your 2008 nonwage income (such as dividends or interest)” with the double quotes on either end, Google offers you a link to a PDF of IRS Form W-4 for 2008. This shows that the 2008 W-4 PDF document has been indexed. (Incidentally, Google also offers you a link to the mcbuzz.com page you are reading right now since it contains the same string of text.)

Another way to see if a PDF has been indexed by Google is to use the “site:url” query modifier. This is a handy trick when you want to narrow your search to one domain. If I Google [site:mcbuzz.com] – without the brackets, Google lists every page in my site that has been indexed. If I Google [site:mcbuzz.com web] – without the brackets, Google lists every page in my site that contains the word “web”. And, as a helpful reader points out below, you can Google [site:mcbuzz.com filetype:pdf] – without the brackets, to see if there are any PDF files on the mcbuzz.com website that have been indexed by Google. (Be sure not to put a space between “filetype:” and “pdf”.)

I don’t have any PDFs on my site. Try it with another domain to see an actual positive result. To see if a particular PDF on my website has been indexed, I can Google [site:mcbuzz.com "some word or phrase in the PDF"] – without the brackets. Of course, you can also Google [site:mcbuzz.com myfilename.pdf] to do the same.

Returning to the Form W-4 example, Google [site:irs.gov "Enter an estimate of your 2008 nonwage income (such as dividends or interest)"] and Google lists one and only one result: the PDF on the IRS website.

Something fairly amazing: Google knows every phrase in that PDF and in any other document or web page it has indexed. That’s a lot of information.

The other question in the mcbuzz.com post mentioned above was whether a PDF would be indexed if it were encrypted or had other security settings applied to it. If you have Adobe Acrobat 8 Professional, you can answer this question yourself.

Open a PDF and then open the Document Properties dialog box (File > Properties…). Click on the Security tab, and you see the various security options. There are different security options depending on which version of Acrobat Reader you want your PDF to be compatible with. The dialog box tells you if search engines will be able to read all or only some parts of the PDF (e.g. metatags or attachments) when you select the various options. If the PDF can’t be read by search engines, it won’t be indexed.

For those interested, here is more information about Google query modifiers like “site:url”.

Click here » if you would like to find out about Search Engine Optimization (SEO) Services offered by McBuzz.

Related posts:

  1. How to See If a Page or Document on Your Website Has Been Indexed by Google
  2. Optimize PDF Files For Websites and Search Engines
  3. Web Marketing Tip #1: Use "site:url" to Find Out If Your Website Is Indexed by Google
  4. Web Marketing Tip #2: Use XML Sitemap Generator to Get All Your Website's Pages Indexed by Google
  5. Natural Search Results vs. Paid Search Results – What’s the Difference?

Comments: 7

7 comments

Hank

Feb 6, 2008

17:16:09

#1

Shucks! I learn something new here all the time.

Thanks!

Mark McLaren

Feb 6, 2008

18:23:41

#2

Hank,
It’s always great to hear from readers. And an exclamation like ‘Shucks!’ is just icing on the cake.

From the looks of your website, you do some web marketing yourself, but if there are any questions the McBuzz crew might be able to answer, give us a shout.

Thanks for reading!

Archie

Feb 25, 2008

06:08:02

#3

The easiest way to determine which pdfs Google has indexed is to use this search format !
site:mydomain.com filetype:pdf

Ed

Apr 26, 2008

16:26:40

#4

It’s not too surprising they index PDF’s. My guess is that they simply convert them all to plain text/basic HTML – and then use their standard Googlebrain to index those files.

kelly

Jun 23, 2008

00:19:06

#5

Good post!

kelly

Jun 23, 2008

00:22:05

#6

site:mydomain.com filetype:pdf in google didn’t work….Or am I missin something?

Mark McLaren

Jun 23, 2008

01:04:40

#7

Kelly,
Thanks for commenting. I’m not sure what you mean here by the example “site:mydomain.com filetype:pdf”

If you put something like “site:intel.com filetype:pdf” into Google (without the quotation marks) and click the Google Search button, you get a list of all the PDF documents on the intel.com site that have been indexed by Google.

If you try the same with “site:mcbuzz.com filetype:pdf” you get a page saying “Your search – site:mcbuzz.com filetype:pdf – did not match any documents.” – because there are no PDFs on my site.

Leave a Comment