August 20th, 2009 - 3 Comments

PDF's on the web (the poll results are in…)

Firstly, thank you to everyone on Twitter who took the poll and tweeted about it, I wanted to do a quick poll but to get 100 votes so that it would be somewhat indicative, and thanks to all the great people on Twitter the poll reached 140 votes in no time at all.

I am working on two sites that involve a large amount of PDF documents, in thinking about how the PDF files would be disseminated I examined my own avoidance of PDF files in search engine results. My reasons for avoiding them are:

  • PDF’s are generally slower to open in my browser. They can be horrendously slow to open depending on the size.
  • A (small) percentage of PDF’s crash my browser – but that small percentage has greatly contributed to my hesitance to open PDF’s in my browser.
  • I am too lazy to download a PDF to my desktop to discover if it contains the content I am looking for.
  • HTML view of PDF’s tends to be fairly awful in their presentation making it more difficult to find relevant content.

google pdf result
A PDF result in Google

A PDF search result also means the searcher may never actually visit the website, and that needs to be weighed up against the benefit of potentially having extra content spidered by the search engines.

Jakob Nielsen is a divisive usability expert you either love or hate. I happen to love him, and while I wouldn’t go so far as to say PDF’s were unfit for human consumption it turned out that he described pretty much all my reservations quite succinctly:

  • Linear exposition. PDF files are typically converted from documents that were intended for print, so the authors wouldn’t have followed the guidelines for Web writing. The result? A long text that takes up many screens and is unpleasant and boring to read.
  • Jarring user experience. PDF lives in its own environment with different commands and menus. Even simple things like printing or saving documents are difficult because standard browser commands don’t work.
  • Crashes and software problems. While not as bad as in the past, you’re still more likely to crash users’ browsers or computers if you serve them a PDF file rather than an HTML page.
  • Breaks flow. You have to wait for the special reader to start before you can see the content. Also, PDF files often take longer time to download because they tend to be stuffed with more fluff than plain Web pages.
  • Orphaned location. Because the PDF file is not a Web page, it doesn’t show your standard navigation bars. Typically, users can’t even find a simple way to return to your site’s homepage.
  • Content blob. Most PDF files are immense content chunks with no internal navigation. They also lack a decent search, aside from the extremely primitive ability to jump to a text string’s next literal match. If the user’s question is answered on page 75, there’s close to zero probability that he or she will locate it.
  • Text fits the printed page, not a computer screen. PDF layouts are often optimized for a sheet of paper, which rarely matches the size of the user’s browser window. Bye-bye smooth scrolling. Hello tiny fonts.

But hold on, that was written in 2003 – what if Nielsen’s content is outdated, and so are my views? I decided to do a quick poll, just as a very basic temperature gauge as it were.

I set up a poll and asked the following question:

You do a search on Google and two results look equally good, one HTML one PDF. Which do you choose?

  • PDF
  • HTML
  • Either, I don’t have a preference

The answers were random in order of appearance and the results were not visible to the polltakers. The results were as folllows:

pdf poll results

HTML – 76% (107 votes)
PDF – 10% (14 votes)
Either, I don’t have a preference – 14% (19 votes)

I was somewhat relieved to be assured that I am not a fuddy duddy before my time, and that my reluctance to dive straight into a PDF is shared by many it would seem.

All of this is not to say that I hate PDF’s – I don’t, they’re wonderful, and if I am confident the content is valuable I will download them. The trick is to communicate the value in order to convince me to download. Usually a search result snippet is not capable of this communication. Enter “gateway” pages. Jakob Nielsen has a great article on presenting PDF’s through gateway pages if you’d like to read it.

For the client sites I was working on the gateway pages are vital, and we have ensured that the pages are optimised to be found for the most relevant searches and provided a link to the PDF documents with clear indication that it is a PDF and a note about the size of the document.

The question remained whether to allow the search engines to spider the PDF’s or not. In one case the PDF documents were third party licenced PDF’s and so I decided to avoid having them spidered by the search engines, as it would be vital to contextualise the PDF through the gateway page.

In the second case, we are still discussing the various merits of having the content spidered as it is proprietary content and therefore will have a brand awareness effect if downloaded and as the PDF’s will contain significantly more information than the gateway page, they may be found for certain searches the gateway page would not.

As always any and all thoughts are always more than welcome in the comments.

This entry was posted on Thursday, August 20th, 2009 at 11:14 pm and is filed under Search Engine Optimisation. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Bookmark and Share

 
 

3 Responses to “PDF's on the web (the poll results are in…)”

Nice to see that the facts and figures resemble my own thoughts on PDFs as web pages.

From our IM Chat, did you look further into using Flash Paper?

August 21st, 2009 at 12:03 am by Cormac

I had a quick look at Flash Paper and suggested to the client that they have a look at it.

We’re still mulling over various options but on that project we’re also still defining what the ultimate aims are, and so we’re not sure yet whether Flash Paper solves a problem or simply adds a layer of complexity not needed.

Thanks again for that chat, helped clarify thoughts a lot :)

August 21st, 2009 at 12:11 am by Frank Prendergast

[...] from Website Design Cork has published the results of a poll about PDF files in Google Search results. Author: Gordon Murray Categories: murmurs Tags: fas, google ads, iphone, pdf, productivity, [...]

August 21st, 2009 at 9:36 am by murmurs 21/08/2009 | Murmurs

Leave a Comment

Send me an Email!Read the BlogStuff I Love!