Tagged: pdf

How to keep PDF relevant? Flash and semantics.

My thoughts submitted to the Adobe Reader Blog for the post Take the Adobe Reader Survey. As a former Adobe employee who worked on Acrobat & PDF I have a lot of personal interest in seeing the format grow and evolve.

The growing public perception is that PDF is too bulky and increasingly too opaque for the networked world. This is because PDF’s have not kept up with the prevailing trends of transparency, findability, and collaboration. PDF is important as a container with certain rights & privileges (DigSig, Security, Markup, Forms), but the data inside a PDF is far more important. Currently, PDF’s are way too opaque, too bloated, and do not clearly convey value to most users. This is especially true on mobile (why would I chose to view PDF on mobile if not required by an enterprise I need to engage with?). For most enterprises and customers, PDF is a cloud of data more than a display standard. It’s value is no longer in consistent display of fonts and formatting. It’s in the data within the millions of PDF’s that the IRS has, for example. Even as a Forms front-end it’s difficult to see why Reader/Acrobat is a better solution than a robust customizable Flash interface. The Flash-based Portfolios feature is a step in this direction.

How can Reader add value to the massive volumes of archival PDF that already exist? Answer: 1) replace Reader with a robust, customizable Flash front-end, and 2) engineer semantic data* into new & existing PDF’s so that cloud agents can sift through the documents and return meaningful results. Both of these strategies should focus heavily on supporting Live Cycle for both distilling and evaluation of PDF’s.

The static viewer model is dying. People need to be able to search, sort, find, annotate, and share. Reader is already too heavy to be of value in a browser, much less on a mobile device. Any mobile solution must dis-aggregate formatting from data and be able to dynamically reconfigure the display to present only the important data/form elements to the mobile user. At the very least, PDF’s need some serious reformatting before they can be of any real value on the mobile platform. There’s just not enough real estate. Furthermore, any PDF-mobile solution must begin with the realization that mobile = personal, collaborative, locative.

If Adobe doesn’t do this, you can bet there will be lucrative opportunities for others who understand that the value of data is no longer in it’s formatting. It’s in accessibility and structured reporting. Frankly, any business intelligence solution that doesn’t address the growing heap of PDF’s lying in their servers will fail to really leverage their own data effectively.

* I think I’m starting to use the term “semantic” a bit loosely. Essentially, I’m suggesting that Acrobat should engineer active creation of RDF structures inside PDF COS and as header info. PDFLib should extend to support both writing & reading of this framework. Likewise, top-down text analysis should spider both doc text and COS to construct relevant metadata (RDF & taxonomies) written into the PDF file header. The point is to make PDF’s as transparent & searchable as possible to those actors & agents with access rights.