This past Saturday I worked with Mike Liebhold, Gene Becker, Anselm Hook, and Damon Hernandez to present the West Coast Augmented Reality Development Camp at the Hacker Dojo in Mountain View, Ca. By all accounts it was a stunning success with a huge turn-out of companies, engineers, designers, makers, artists, geo-hackers, scientists, techies and thinkers. The planning was mostly done virtually via email and phone meetings with only a couple visits to the venue. On Saturday, the virtual planing phase collapsed into reality and bloomed on site into AR Dev Camp.
As an un-conference, the event itself was a study in grassroots, crowd-sourced, participatory organization with everyone proposing sessions which were then voted on and placed into the schedule. To me, it was a wonderfully organic and emergent process that almost magically gave life and spirit to the skeleton we had constructed. So before I launch into my thoughts I want to give a hearty “Thank You!” to everyone that joined us and helped make AR DevCamp such a great experience. I also want to give a big shout-out to Tish Shute, Ori Inbar, and Sophia for coordinating the AR DevCamp in New York City, as well as Dave Mee & Julian Tate who ran the Manchester, UK event. And, of course, we couldn’t have done it without the help of our sponsors, Layar, Metaio, Qualcomm, Google, IFTF, Lightning Laboratories, Web3D Consortium, IDEAbuilder, MakerLab, and Waze (and URBEINGRECORDED with Cage Free Consulting contributed the flood of afternoon cookies).
So first, just what is Augmented Reality? There’s a tremendous amount of buzz around the term, weighing it down with connotations and expectations. Often, those investing in it’s future invoke the haunting specter of Virtual Reality, doomed by it’s inability to live up to the hype: ahead of it’s time, lost mostly to the realm of military budgets and skunkworks. Yet, the AR buzz has driven a marketing rush throwing gobs of money at haphazard and questionable advertising implementations that quickly reach millions and cement in their minds a narrow association with flashy magazine covers and car ads. Not to diminish these efforts, but there’s a lot more – and a lot less – going on here.
In it’s most distilled form, augmented reality is an interface layer between the cloud and the material world. The term describes a set of methods to superimpose and blend rendered digital interface elements with a camera stream, most commonly in the form of annotations such as text, links, and other 2 & 3-dimensional objects that appear to float over the camera view of the live world. Very importantly, AR includes at it’s core the concept of location mediated through GPS coordinates, orientation, physical markers, point-clouds, and, increasingly, image recognition. This combination of location and superimposition of annotations over a live camera feed is the foundation of AR. As we’re seeing with smart phones, the device knows where you are, what direction you’re facing, what your looking at, who & what is near you, and what data annotations & links are available in the view. In this definition, the cloud is the platform, the AR browser is the interface, and annotation layers are content that blend with the world.
So the augmented reality experience is mediated through a camera view that identifies a location-based anchor or marker and reveals any annotations present in the annotation layer (think of a layer as a channel). Currently, each of these components is uniquely bound to the AR browser in which they were authored so you must use, for example, the Layar browser to experience Layar-authored annotation layers. While many AR browsers are grabbing common public data streams from sources like Flickr & Wikipedia, their display and function will vary from browser to browser as each renders this data uniquely. And just because you can see a Flicker annotation in one browser doesn’t mean you will see it in another. For now, content is mostly bound to the browser and authoring is mostly done by third-parties building canned info layers. There doesn’t seem to be much consideration for the durability and longevity of these core components, and there is a real risk that content experiences may become fractured and ephemeral.
Indeed, content wants to be an inclusive, social experience. One of the core propositions underlying our motivation for AR DevCamp is the idea that the platforms being built around augmented reality should be architected as openly as possible to encourage the greatest degree of interoperability and extensibility. In the nascent but massively-hyped AR domain, there’s a growing rush to plant flags and grab territory, as happens in all emergent opportunity spaces. The concern is that we might recapitulate the Browser Wars – not intentionally but by lack of concerted efforts to coordinate implementations. While I maintain that coordination & open standardization is of necessity, I question my own assumption that without it we’ll end up with a bunch of walled gardens. This may be under-estimating the impact of the web.
Yet, this cooperation and normalization is by no means a given. Just about every chunk of legacy code that the Information Age is built upon retains vestiges of the git-er-done, rush to market start-up midset. Short-sighted but well-meaing implementations based upon limited resources, embryonic design, and first-pass architectures bog down the most advance and expensive software suites. As these code bases swell to address the needs of a growing user base, the gap between core architecture and usability widens. Experience designers struggle against architectures that were not able to make such design considerations. Historically, code architecture has proceeded ahead of user experience design, though this is shifting to some degree in the era of Agile and hosted services. Nevertheless, the emerging platforms of AR have the opportunity – and, I’d argue, the requirement – to include user research, design, & usability as core components of implementation. The open, standardized web has fostered a continuous and known experience across it’s vast reaches. Artsy Flash sites aside, you always know how to navigate and interact with the content. The fundamentals of AR need to be identified and agreed upon before the mosaic of emerging code bases become too mature to adjust to the needs of a growing user base.
Given the highly social aspect of the web, place-based annotations and objects will suffer greatly if there’s not early coordination around a shared standard for anchors. This is where the Browser Wars may inadvertently re-emerge. The anchor is basically the address/location of an annotation layer. When you look through an augmented view It’s the bit of data that says “I’m here, check out my annotations”. Currently there is no shared standard for this object, nor for annotations & layers. You need the Layar browser in order to see annotation layers made in it’s platform. If you only have a Junaio browser, you won’t see it. If you annotate a forest, tagging each tree with a marker linked to it’s own data registry, and then the browser app you used to author goes out of business, all those pointers are gone. The historical analog would be coding your website for IE but anyone with Mosaic can’t see it. This is where early design and usability considerations are critical to ensure a reasonable commonality and longevity of content. Anchors, annotations, & layers are new territory that ought to be regarded as strongly as URL’s and markup. Continuing to regard these as independent platform IP will balkanize the user experience of continuity across content layers. There must be standards in authoring and viewing. Content and services are where the business models should innovate.
So if we’re moving towards an augmented world of anchors and annotations and layers, what considerations should be given to the data structure underlying these objects? An anchor will have an addressable location but should it contain information about who authored it and when? Should an annotation contain similar data, time-stamped and signed with an RDF structure underlying the annotation content? How will layers describe their contents, set permissions, and ensure security? And what of the physical location of the data? An anchor should be a distributed and redundant object, not bound to the durability and security of any single server. A secure and resilient backbone of real-world anchor points is critical as the scaffolding of this new domain.
Earthmine is a company I’ve been watching for a number of months since they presented at the IFTF. They joined us at AR DevCamp to present their platform. While many AR developers are using GPS & compass or markers to draw annotations over the real world, Earthmine is busy building a massive dataset that maps Lat/Long/Alt coordinates to hi-rez images of cities. They have a small fleet of vehicles equipped with stereoscopic camera arrays that drive around cities, capturing images of every inch they see. But they’re also grabbing precise geolocation coordinates that, when combined with the image sets, yields a dense point cloud of addressable pixels. When you look at one of these point clouds on a screen it looks like a finely-rendered pointillistic painting of a downtown. They massage this data set, mash the images and location, and stream it through their API as a navigable street view. You can then place objects in the view with very high accuracy – like a proposed bus stop you’d like to prototype, or a virtual billboard. Earthmine even indicated that making annotations in their 2d map layer could add a link to the augmented real-world view. So you can see a convergence and emerging correlation between location & annotation in the real world, in an augmented overlay, on a flat digital map, and on a Google Earth or Virtual World interface. This is an unprecedented coherency of virtual and real space.
The Earthmine demo is cool and the Flash API offers interesting ways to customize the street view with 2d & 3d annotations but the really killer thing is their dataset. As alluded to, they’re building an address space for the real world. So if you’re in San Francisco and you have an AR browser that uses the Earthmine API (rumors that Metaio are working on something here…) you can add an annotation to every STOP sign in The Mission so that a flashing text of “WAR” appears underneath. With the current GPS location strategy this would be impossible due to it’s relatively poor resolution (~3-5 meters at best). You could use markers but you’d need to stick one on every STOP sign. With Earthmine you can know almost exactly where in the real world you’re anchoring the annotation… and they can know whenever you click on one. Sound familiar?
Augmented reality suggests the most significant shift in computation since the internet. As we craft our computers into smaller and smaller mobile devices, exponentially more powerful and connected, we’re now on the verge of beginning the visual and locational integration of the digital world with the analog world. We’ve digitized much of human culture, pasted it onto screens and given ourselves mirror identities to navigate, communicate, and share in this virtual space. Now we’re breaking open the box and drawing the cloud across the phenomenal world, teaching our machines to see what we see and inviting the world to be listed in the digital Yellow Pages.
So, yeah, now your AR experience of the world is covered in billboards, sloganeering, propaganda, and dancing dinosaurs all competing for your click-through AdSense rating. A big consideration, and a topic that came up again & again at AR DevCamp, is the overwhelming amount of data and the need to filter it to some meaningful subset, particularly with respect to spam and advertising. A glance across the current crop of iPhone AR apps reveals many design interface challenges, with piles of annotations all occluding themselves and your view of the world. Now imagine a world covered in layers each with any number of annotations. UI becomes very important. Andrea Mangini & Julie Meridian led a session on design & usability considerations in AR that could easily be a conference of it’s own. How do you manage occlusion & sorting? Level of detail? What does simple & effective authoring of annotations on a mobile device look like? How do you design a small but visible environmental cue that an annotation exists? If the URL convention is an underlined text, what is the AR convention for gently indicating that the fire hydrant you’re looking at has available layers & annotations? Discoverability of the digital links within the augmented world will be at a tension with overwhelming the view of the world itself.
When we consider the seemingly-inevitable development of eyewear with digital heads-up display, occlusion can quickly move from helpful to annoying to dangerous. No matter how compelling the augmented world is you still need to see when that truck is coming down the street. Again, proper design for human usability is perhaps even more critical in the augmented interface than in a typical screen interface. Marketing and business plans aside, we have to assume that the emergence of truly compelling and valuable technologies are ultimately in line with the deep evolutionary needs of the human animal. We’re certainly augmenting for fun and art and engagement and communication but my sense is that, underneath all these we’re building this new augmented reality because the power & adaptive advantage mediated through the digital domain is so great that we need it to integrate seamlessly with our mobile, multi-tasking lives. It’s been noted by others – Kevin Kelly comes to mind – that we’re teaching machines to do many of things we do, but better. And in the process we’re making them smaller and more natural and bringing them closer and closer to our bodies. Ponderings of transhumanity and cyborgian futures aside, our lives are being increasingly augmented and mediated by many such smart machines.
DARPA wasn’t at AR Dev Camp. Or at least if they were, they didn’t say so. There was a guy from NASA showing a really cool air traffic control system that watched aircraft in the sky, tagged them with data annotations, and tracked their movements. We were shown the challenges to effectively register the virtual layer – the annotation – with the real object – a helicopter – when it’s moving rapidly. In other words, the virtual layer, mediated through a camera & a software layer, tended to lag behind the 80+ mph heli. But in lieu of DARPA’s actual attendance, it’s worth considering their Urban Leader Tactical Response, Awareness & Visualization (ULTRA-Vis) program to develop a multimodal mobile computational system for coordinating tactical movements of patrol units. This program sees the near-future soldier as outfitted with a specialized AR comm system with a CPU worn on a belt, a HUD lens over one eye, a voice recognition mic, and a system to capture gestures. Military patrols rely heavily on intel coming from command and on coordinating movements through back-channel talk and line-of-sight gestures. AR HUDs offer simple wayfinding and identification of team mates. Voice commands can execute distributed programs and open or close comm channels. Gestures will be captured to communicate to units both in an out of line-of-sight and to initiate or capture datastreams. Cameras and GPS will track patrol movements and offer remote viewing through other soldier’s cameras. But most importantly, this degree of interface will be simple, fluid, and effortless. It won’t get in your way. For better or for worse, maximizing pack hunting behaviors with technology will set the stage for the future of human-computer interaction.
After lunch provided by Qualcomm, Anselm Hook led an afternoon session at AR DevCamp titled simply “Hiking”. We convened in a dark and hot room, somewhat ironically called the “Sun Room” for it’s eastern exposure, to discuss nature and what, if any, role AR should play in our interface with the Great Outdoors. We quickly decided to move the meeting out into the parking lot where we shared our interests in both built and natural outdoor environments. A common theme that emerged in words and sentiment was the tension between experience & distraction. We all felt that the natural world is so rich and special in large part due to it’s increasing contrast to an urbanized and mechanized life. It’s remote and wild and utterly disconnected, inherently at peace in it’s unscripted and chaotic way. How is this value and uniqueness challenged by ubicomp and GPS and cellular networks? GPS & cellphone coverage can save lives but do we really need to Twitter from a mountain top? I make no judgement calls here and am plenty guilty myself but it’s worth acknowledging that augmented reality may challenge the direct experience of nature in unexpected ways and bring the capacity to overwrite even the remote corners of the world with human digital graffiti.
But remember that grove of trees I mentioned before, tagged with data annotations? Imagine the researchers viewing those trees through AR lenses able to see a glance-able color index for each one showing CO2, O2, heavy metals, turgidity, growth, and age. Sensors, mesh nets, and AR can give voice to ecosystems, cities, communities, vehicles, and objects. Imagine that grove is one of thousands in the Brazilian rainforest reporting on it’s status regularly, contributing data to policy debates and regulatory bodies. What types of augmented experiences can reinforce our connection to nature and our role as caretakers?
On the other hand, what happens when you and the people around you are each having very different experiences of “reality”? What happens to the commons when there are 500 different augmented versions? What happens to community and society when the common reference point for everything – the very environment in which we exist – is malleable and fluid and gated by permissions and access layers or overwrought with annotations competing for our attention? What social gaps could arise? What psychological ailments? Or perhaps more realistically, what happens when a small class of wealthy westerners begin to redraw the world around them? Don’t want to see other people? No problem! Just turn on the obfuscation layer. Ugly tenements ruining your morning commute? Turn on some happy music and set your iGlasses to the favela paintshop filter! Augmentation and enhancement with technology will inevitably proceed along economic lines. What is the proper balance between enjoying our technological luxuries and responsibly curating the world for those less fortunate? Technology often makes the symptoms look different but doesn’t usually eradicate the cause. In the rush to colonize the augmented reality, in the shadow of a wavering global economic system and deep revision of value and product, now is the best time and the most important time to put solutions ahead of products; to collaborate and cooperate on designing open, robust, and extensible systems; and, in the words of Tim O’Reilly, to “work on stuff that matters”.
At the end of the day, pizza’s arrived (Thanks MakerLab!), beers were opened (Thanks Layar & Lighting Labs), and the buzzing brains of AR DevCamp mingled and shared their thoughts. Hearts alit, I’ll be forgiven some sentimentality to suggest that the Hacker Dojo had a soft, warm glow emanating from all the fine folks in attendance. Maybe it was like this around the Acid Tests in the 60’s (with more paisley). Or the heady days of PARC Xerox in the 80’s (with more ties). That growing inertia and sense of destiny at being at the right place at the right time just at the start of something exceptional…
Special thanks to Andrea Mangini for deep and ranging discussions about all this stuff, among many other things.