Augmented reality is a term that is increasingly bandied about in the technical press as the applicability and quality of this medium steadily increases. As is often the case, different sources and articles use different definitions for the term and it’s easy for a reader to get confused about what is meant. This post attempts to demystify what is meant by augmented reality and provide Sentireal’s particular definition, which aligns with that put forward by Alan B. Craig in his excellent book “Understanding Augmented Reality”.
In Craig’s book, augmented reality is defined as:
“Augmented Reality: A medium in which digital information is overlaid on the physical world that is in both spatial and temporal registration with the physical world and that is interactive in real time.”
Phew! That’s a lot of words. However, let’s break the statement down into its constituent parts and see if we can work out its overall meaning.
A medium …
“Hold it right there”, you may be saying. “You keep referring to augmented reality as a medium but isn’t it a technology? Doesn’t it involve lots of complex sensors, signal processing, graphics, video and the like?” Well, augmented reality relies on all those things for sure, but ultimately the purpose of augmented reality is to deliver an experience to the consumer. In that sense augmented reality is a medium or an art form, much in the same way that print, music or cinema are. For example, you don’t think of music as a technology, although modern music relies on some sophisticated technology for processing, recording, editing, mastering and reproduction. You may have heard the phrase “content is king” and this concept applies to augmented reality. If the artistic content delivered by augmented reality is not compelling for the consumer then the sophisticated technology generally won’t rescue the experience. It’s analogous to CGI effects in movies – they can make a good movie great but they can’t rescue a movie with a dire plot and poor acting.
… in which digital information …
“Digital information, David? Isn’t that a bit vague?” That’s a fair comment, but the generality of the term captures a very powerful aspect of augmented reality, in that it can add just about anything that can be represented as digital information to the physical world we perceive. Text, graphics, video and audio can be added easily and, with appropriate devices, even esoteric information such as touch, taste and smell can be conveyed to the consumer. Note that the digital information can be completely computer-generated, for example a 3D graphical model created by an animator, or it can be a digital copy of real-world information, for example a video showing a real-world location.
… is overlaid on the physical world …
So now we’re combining our digital information and the physical world that we naturally perceive. The key word in our definition phrase is overlaid. With augmented reality the consumer perceives the digital information and the physical world together. In a sense we want to realistically mix the physical world and the digital information together. If you’re of a certain age (which I am!) you may recall the 1988 movie “Who Framed Roger Rabbit?” which realistically mixed cartoon animation and live action. That is the type of effect that augmented reality tries to achieve by overlaying digital information on top of the physical world. Of course, unlike the Roger Rabbit movie, augmented reality needs to do this on a variety of physical-world scenes and not just on a single carefully-created movie set!
You may be thinking “How is this overlay accomplished for augmented reality? If a person is standing at a location in the physical world how is the digital information added in a way that lets the person’s senses perceive this mixture of the real and the computer-generated?” There are two possibilities. The first approach is that the physical-world scene is converted into a digital representation by a variety of sensors, the digitised physical-world scene and the digital information are summed together and the mixed digital content is then send to displays to be perceived by the consumer. This is the approach typically adopted by augmented reality using smartphones and tablet computers – the physical-world scene is captured using the sensors (cameras, microphones, etc) on the device, the augmenting digital content is added internally to the device and the resultant mixed digital content is sent to the device displays (screen, speakers/headphones, etc). The second technique to perform the overlay is to project the augmenting digital content into the physical-world scene e.g. using a visual projector to project any graphical, image or video material. This is the approach typically adopted by wearable augmented reality glasses.
… that is in both spatial and temporal registration with the physical world …
There was quite a lot of information in the previous point, so let’s review where we’ve got to. Augmented reality is a medium that allows digital information to be overlaid on the physical world so that the consumer perceives a mixture of the digital information and the physical-world scene. Right, so what’s the next part of our augmented reality definition saying? Uh-oh, this part sounds complicated! Actually it’s pretty reasonable if we break the phrase down a little further. Let’s consider the idea of spatial registration with the physical world first. This is a formal way of saying that computer-generated objects in the augmenting digital information should be overlaid onto the physical world with a proper spatial perspective. Craig’s book uses the example of a table in the physical world onto which we want to overlay the digital image of a vase. The concept of spatial registration means that the consumer perceives the vase to be sitting on the table, not floating above the table, sunk inside the table or floating in front of or behind the table. If the consumer changes their physical position, so that their perspective of the table changes, then their perspective of the augmenting vase changes in exactly the same way. If the consumer turns their back on the table so that they cannot see it then the vase disappears also. The augmenting digital content (the vase) has a proper spatial relationship (is in spatial registration) with the relevant physical-world object (the table).
Now let’s consider the idea of temporal registration with the physical world. This is a formal way of saying that computer-generated objects in the augmenting digital information should be updated and redisplayed at the same time as corresponding updates and changes in the physical-world scene. Let’s use our table and vase example again. If the consumer moves in the physical world so that their perspective of the table changes then the vase should be redisplayed to show the new perspective immediately (or at least quickly enough to appear like an immediate update to the consumer). For example, if the consumer circles to the opposite side of the table then the side and latterly the back of the vase should be displayed or projected promptly to maintain the proper effect. If the updated perspectives of the vase are displayed with too much time delay then the vase will not “keep up” with the physical world as the consumer moves and the effect will look very unrealistic.
… and that is interactive in real time.
Stay with me – we’re almost there! The last part of our definition of augmented reality deals with how the consumer can interact with the augmenting digital information that is added. Returning to our table and vase scenario, what happens if the consumer puts out their hand and tries push the vase along the table? What happens if they pick up a physical-world object, like a ball, and throw it (hopefully accurately!) at the vase? For the first example we might expect the vase to be successively redisplayed in different positions on the table as the consumer “pushes” it, depending on where the system detects the consumer’s hand position to be. Due to the requirement for spatial registration we would expect the vase to be redisplayed to make its movement look similar to that of a real vase being pushed along a table. For the second example we might expect a shattering sound to be played as the ball enters the region where the vase is currently displayed, followed by the vase being redisplayed in a broken form! Due to the requirement for temporal registration we require the augmenting digital information to be very promptly updated as a result of the consumer interacting with it or manipulating it. In technical terms we say that we require the digital information to be updated in real time. Failure to update the augmenting digital information in real time can make interactions with the digital information feel unnatural to the consumer e.g. they throw the ball and see it “hit” the vase but the vase fails to shatter until several seconds later!
So you made it all the way through? Well done! Hopefully you now have a clear understanding of our preferred definition of the term “augmented reality”. But, like Lieutenant Colombo or Steve Jobs, there’s just one more thing. Is it valid to distort or even break parts of our definition for artistic effect? For example, if we deliberately distort spatial and temporal registration to give an artistic impression of augmenting objects experiencing the gravitational pull on the Moon rather than the Earth is that still “augmented reality”? My personal opinion is that it is, but you’ve got to know the rules before you can break them!