Open Information, Data Viz, & The New York Times

The New York Times will not go quietly into the dark knight of new media. Amidst constant rumors of the death of traditional news, the much-respected industry stalwart is moving quickly to build a compelling and forward-looking solution that redefines “the newspaper as platform”, as ReadWriteWeb’s Marshall Kirkpatrick notes.

Today the NYT announced that 2.8 million articles will be exposed to the digital world through the site’s API, allowing anyone to link, annotate, mashup, and crawl the data for meaning. This opportunity to construct data visualizations that abstract patterns & trends from within the articles is perhaps the most interesting element that immediately adds human value to what is otherwise an overwhelming amount of information (2.8 articles).

The recent Twitter Superbowl visualization, as well as other visualization experiments at, are indicators of how the company is gathering data and parsing it in meaningful ways. A list of Twitter posts related to the Superbowl is just a long index table. Even reading the Summize Search feed for such a huge event is dizzying. But a geo-located, timeline mashup of tweets & key terms with a map of the US is immediately valuable to anyone trying to get a bead on trends. Their implementation is simple & entertaining, and you can derive substantial meaning at a glance.

These experiments are proofs of concept that point the way towards more advanced viz mashups now further enabled by opening the NYT information archives and building a coherent API on top. Imagine, for example, sections of dedicated to serving all outgoing comm from a particular region, say Gaza & Jerusalem. Imagine seeing seeing real-time visualizations of the thoughts and feelings of average citizens free from the carefully structured statements of the vested power interests hurling rockets and armor at each other. Or imagine crawling the news reports of the last 8 years looking for instances of the words “Bush”, “Abramoff”, and “Florida flight school”…

Of course, this is another big win for the information transparency movement – information wants to be free, after all – and you can expect many others to get the message and follow suit. But it also wraps the current events of our world as reported by NYT in a searchable and re-configurable layer establishing a protocol for interfacing with these vast data stores. This open approach certainly cries out for some sort of semantic layer and I suspect the Reuters/Calais folks are paying very close attention to this announcement.

This is the prevailing trend of this current phase of digitizing culture and communication. Data is accumulating at an ever-increasing rate requiring open standards for archiving, interrogating, & visualizing the meaning held within. The tools are evolving to sort the signal from a vast sea of noise. More and more information archives will be exposed and more and more tech will be created to interface with it and draw out meaning from the morass. The global sharing of information and communication is feeding the pool of innovation that continues to radically alter the face of our world with each new discovery.

Whether or not information wants to be free… We certainly need it to be.


