When you visit a consumer technology retailer such as MediaMarkt or Best Buy you see the latest televisions and computer monitors promoted based on their screen size, resolution and refresh rate. Although sight is a key sense in creating a rich immersive experience, we need to expand our view to include the other four senses to get a more accurate estimation of the level of immersion of a simulated reality. Each sense we will explore in order to estimate the minimum level of resolution required for achieving natural reality.
UHD or Ultra High Resolution (3849 x 2160), popularly knows as 4K appeared in 2014 for high-end monitors and is becoming the norm for the consumer televisions in 2020. UHD 8K (7680 × 4320 pixels) television are commercially available for consumers with deeper pockets. Resolution is not the only criteria that is important for immersive viewing experiences. Large outdoor media facade such as the Dubai Aquarium & Underwater Zoo (UAE) at Dubai Mall can easily reach resolutions multiple times higher than 8K. What is important for resolution is dots per inch. A typical 24″ 4K monitor has 185 pixels per inch. The iPhone X smartphone has a super retina display of 458 pixels per inch. The human eye cannot see more than 300 pixels per inch so putting more pixels has no practical use.
Sight
Michael Deering has modelled the perception limits of the human visual system. There is a maximum estimate of approximately 15 million resolution pixels per eye that can be seen. This high resolution however applies only to the central 2 degree of vision and therefore variable to where you eyes focus. Humans perceive a stable intensity image without flicker artifacts at a maximum observable rate as 50-90 Hz. This is rate is know as the critical flicker fusion rate and has been studied for both spatially uniform lights and spatial-temporal displays. A separate line of research has reported that the human eye can detect fast movements in the periphery field of vision at rates of over 500 Hz. Assuming 60 Hz stereo display with a depth complexity of 6, it was estimated a rendering rate of approximately ten billion triangles per second is sufficient to saturate the human visual system.
The Pimax Virtual Reality Vision 8K offers 4K resolution for each eye and has a refresh rate of 80 Hz. The 4K resolution is about 8,3 million pixels which is around half of the maximum estimate of pixels per eye. The 80 Hz frequency is above the maximum observable rate but below the 500 Hz frequency.
Apple introduced in 2023 the Apple Vision Pro as a spatial computer with an ultra-high-resolution display system that packs 23 million pixels across two displays.
Next-generation game engines such as the Unreal Engine 5 can deal with tens of billions of triangles per second by making use of ultra fast bulk storage solid state disks that will be standard on next generation consoles such as the Playstation 5 or Xbox Series X. To reach true immersion for the visual domain we need 16K per eye at 240 Hz according to AMD’s Raja Koduri in 2016.
Based on the above we estimate that we reach true visual immersion before 2025.
Sound
The human ear perceives frequencies between 20 Hz (lowest pitch) to 20 kHz (highest pitch). The human ear has an intensity range from 0dB (hearing threshold) to 120-130 dB (irreversible damage to the ears). Stereo and surround sound systems are able to match these frequencies and intensity with ease but there is more to sound than the loudness (dB), pitch (frequency) and the timbre (variation in frequency). The directionality of sounds plays a crucial role in human evolution to quickly estimate the direction of danger or prey to hunt on and is therefore key to true audio immersion.
The human auditory system uses just two ears to distinguish between front and back, up and down, and everything in between, so it should be possible to create a true 3D audio experience. Although surround sound systems give the illusion of a 3D audio experience when your position is fixed in the virtual world, that illusion is shattered as soon you start to move: The speaker positions are fixed and the audio does not move with you.
Head-related transfer functions (HRTF) describe how acoustic waves propagate to a listener’s ear drums, being acoustically filtered by the listener’s ears and head on the way. Using digital signal processing software and head position tracking, a standardized HRTF can be adjusted in real time during playback according to the listener’s head orientation and original direction of the sound sources. The problem with HRTF is that they are person dependent, you can move in a virtual world without moving your head and the time and cost of creating them is high.
Instead of channel-based playback and formats, object-based recording encodes the sound field by tagging sound sources. For example, in a classic music concert, the positions, intensities, types of instruments of all music players are tagged as objects and smart playback devices then interpret the tags to recreate the concert experience. This approach is taken by Dolby Atmos and DTS:X.
Another approach is the scene-based format. Scene-based encoding (ambisonics) creates a spatial representation of the recorded sound field as seen from a specific position. The basic approach is to treat an audio scene as a full 360-degree sphere of sound coming from different directions around a center point. In the center point is where the microphone is placed while recording. During playback the listener is in the center point and experiences spatial sound around. Facebook and Google use ambisonics technology for 360 degree VR movies.
Mathias Johansson, CEO and cofounder of Dirac Research argues that we should be able to render 3D audio over headphones for head-mounted VR, and adjust it interactively as the listener moves through virtual worlds if we combine object- and scene-based encoding with HRTF processing.
For interactive immesive spaces developers can simplify the acoustic information. They separate the sound into a set of directional sound sources along with a combined ambient sound field, rather than simulating the acoustic characteristics of an entire scene. The directional sounds can then be processed by HRTFs, while the ambient sound is assumed to come with equal intensity from all directions. For most people, this technique produces reasonably convincing 3D sound in some virtual environments. Eventually, more-realistic acoustic simulations of virtual rooms will evolve, improving the authenticity of the audio experience in a wider range of challenging environments.
Matthias Johansson continues: “I expect that within a few years researchers will create convincing 3D audio experiences for VR streams of, say, a basketball game or a concert. Then the big challenge will be fine-tuning the HRTF algorithms to get the computational and memory requirements down to the point where they can run on portable, battery-operated devices. Once this final barrier is overcome, immersive 3D audio for virtual reality will be ready for mass adoption.”
Based on these predictions we can expect to reach true acoustic immersion by the end of the decade.
Touch
The human somatosensory system enables us to experience both pleasant feelings as well as sensations of pain or temperature changes and covers our entire body. The stimuli are therefore highly diverse; the pressure on the skin, the position of our muscles and joints, the temperature of our brains. When stimuli are too strong, our brain is triggered directly by pain receptors in e.g. our hand to remove our hand from a hot object. The somatosensory system differs from other sensorial systems in that the receptors are spread over the entire body and respond to a variety of different stimuli types (touch, temperature, pain and body position). The somatosensory system has both receptors on the surface level as well as deeper inside our bodies in our internal organs and even the cardiovascular system. When we concentrate our attention on the surface level of our somatosensory system – because we assume for now that to achieve true touch immersion we don’t need to stimulate receptors deep into our internal organs – we need to simulate three main types of sensitivity for our mechanoreceptors: 1) Tactile sensitivity (pressure and vibration) 2) temperature sensitivity and 3) pain sensitivity.
Just like with our sense of vision we need a certain resolution and refresh rate to create smooth sensations. The required resolution and refresh rate differ dependent on the part of the body. One type of measurement that can be used is the “two-point threshold” which determines how far apart two separate touch points are needed before they are perceived as two points instead of one. This threshold has been experimentally determined and varies from 1.1 mm for the tongue and 2.3 mm for fingertops to 68 mm for the thighs and upper arms. To achieve this level of realism it means that we need over 1000x as many tactile actuators in a given fingertip versus the same area on our torso. The refresh rate for tactile sensation is not as demanding, to achieve a continuous sensation a refresh rate of 20 to 30 Hz is sufficient. Besides resolution and refresh rate, the displacement or the intensity of an individual touch sensation is important. Displacements of up to 2cm on the torso are needed to create a strong sensation of touch.
There are three approach to simulate the sense of touch: augmenting existing physical objects, full body haptic feedback systems and direct brain computer interaction. The first approach is strictly speaking not truly simulating touch but it is worth mentioning because it recreates a realistic feeling of touch. The Void is a virtual reality experience center where visitors are placed with virtual reality glasses in a physical space complete with tangible physical surfaces such as walls, floors, ceilings and objects such as chairs, tables, cups, weapons, shields. In the VR experience people see textures projected on the objects and surfaces which transform a wooden shield and sword into a beautiful, high quality, strong shield and sword of a king.
Examples of full body haptic feedback systems for virtual and augmented reality are TeslaSuit and bHaptics. For remote surgery, haptic gloves let surgeons feel patients on a distance. Current solutions today incorporate a mesh of actuators that can deliver a wide range of sensations. TeslaSuit for example combines different actuators to recreate the feeling of touch, heat/cold and even force with mild electrical pulses. It can also collect data from the body for real time motion tracking and various biometric parameters (we discuss true immersion movement later on when we explore agency).
Although these haptic feedback systems are an enormous step forward of force feedback on a smartphone or game console controller, achieving true touch immersion will requires a great deal of engineering and innovation. While sufficient touch resolution is already reached for areas such as our upper arms, chest and legs, it is a considerable engineering challenge to support the resolution that is needed for our fingertips (let alone touch resolution for lips and tongue). Displacement is also a challenge. A tactile actuator must be thin enough so it can be comfortably worn on the skin (no more than 2-3 mm) but at the same offer about 2 cm of displacement. This displacement ratio of around 10 is over 100x higher than the haptic motors commonly used in cell phones.
The third approach, human Brain Computer Interaction (BCI) is to bypass the mechanoreceptors and directly activate the somatosentory cortex either invasive (through electronic implants) or non-invasive through e.g. robotic arm or headset. BCIs successful in controlling robotic arms have used invasive brain implants. These implants require a substantial amount of medical and surgical expertise to correctly install and operate, not to mention cost and potential risks to subjects, and as such, their use has been limited to just a few clinical cases. In 2019, a team of researchers from Carnegie Mellon University in collaboration with the University of Minnesota developed a non-invasive mind-control robotic arm showing the ability to continuously follow a computer cursor. Neurolink, founded by Elon Musk in 2016, is working on a scalable, flexible solution consisting of flexible polymer probes, a robotic insertion system and custom low-power electronics. In 2020 it aims to release this system for two main purposes: it is a research platform for use in rodents and serves as a prototype for future human clinical implants. If BCI would achieve true touch immersion, it would replace the need for full body haptic suits and might be able to stimulate even touch our lips and tongue.
Although true touch has traditionally been given much less focus, advancements made in true visual and accoustic immersion have put more emphasis on developing better haptic solutions for virtual reality entertainment, remote surgery, rehabilitation and military training applications. We may therefore expect to see an increase in the rate of innovation for haptic solutions in the 2020s. However this acceleration must be seen in the light of complex engineering challenges that must be tackled such as the further miniaturization of touch actuators to achieve more fine grained touch, increasing the displacement ratio of touch actuators, and improving the usability of tactile suits that restricts you in your movement. Invasive brain computer interfaces will not only require much more research and development, but also demand medical approval and legislation and consumer adoptation, especially for otherwise healthy people the barrier to implant a neural link to simulate touch maybe not one but many steps too far.
True touch immersion will very likely not be reached by the end of this decade – but much sooner than scent and taste.
Olfactory
The human sense of smell is the reception and perception of chemical substances by means of gas molecules through the sensory olfactory system. Our olfactory sensory system enables us to identify different foods, determine if food is safe to eat, detect hazardous situations such as smoke, and detect pheronomes that trigger a social response. The olfactory epithelium is special tissue that is involved in smell. In humans it measures 9 cm2 and lies on the roof of the nasal cavity above and behind the nostrils. This tissue contains the olfactory receptor neurons that feed into the brain. Unlike sound, sight and touch, smell has a direct connection to the two brain areas that control emotions and memories: the amygdala and hippocampus. This is why certain smells seem to recall memories and can trigger strong emotions. The sense of smell and taste are often referred together as the chemosensory system because they give the brain information about chemical substances.
The recent history of digital olfactory devices consists of many startups and companies that have tried to create scents some more successful than others. In 1999, DigiScents iSmell, a USB-connected scent synthesizer, received 20 million dollars in venture capital funding but by 2001 the company had gone out of business. The Multi Aroma Shooter by Aromajoin introduced in 2012 can be programmed to emit from three customizable scent cartridges to support scent marketing and virtual reality applications. Tokyo-based startup Vaqso that is working to incorporate the sense of smell into virtual reality by adding a scent device onto head mounted displays []. We are unaware of BCI research or technology to bypass the olfactory receptors and directly activate the part of our cortex that is involved in the perception of smell.
Digital scent has to overcome many barriers and challenges before true olfactory immersion can be reached. First of all we lack a fundamental understanding of human olfactory perception, it is unclear how exactly we perceive different scents by a combination of chemical substances and therefore impossible to define a basic ‘palette’ of smell components that could be mixed to make thousands of other odors. Secondly, a large proportion of people cannot detect certain scents and others are capable of smelling but unable to give the scent a name, therefore digital scent must be much more tailed to the individual than the other senses. Next to the physiological and psychological aspects there are engineering and economic aspects to consider. Current obstractles for mainstream adoption include the timing and distribution of scents. Scents accumulate in closed environments and removing scents is not as easy as turning off a pixel on a digital screen. Synthetic odours may have health dangers and some people cannot stand certain artificial smells – not exactly the immersion we are searching for. Finally, there is also an economic cost factor purchasing, replishing and switching cartridges in scent devices.
Unlike touch, much less attention, time and money is invested in olfactory devices and VR smell research so not only are the technological, physchological and economical barriers higher than for the other senses, the speed of progress is also much lower. For this reason olfactory virtual reality is still in its infancy and can’t make predictions of when we reach true olfactory immersion.
Taste
Taste is the perception created when a substance in the mouth reacts chemically with taste receptor cells located on taste buds in the oral cavity, mostly on the tongue. The sense of taste, smell and touch (texture, pain and temperature registration) together determine the flavors of food. Taste receptors in the mouth sense the five taste modalities, sweetness, sourness, saltiness, bitterness and savoriness. The full experience of a flavor is produced only after all of the sensory cell profiles from the different parts of the tongue are combined. Assuming these 5 basic tastes and 10 levels of intensity, 100,000 different flavors are possible. Taken together with the senses of touch, temperature and smell, there are an enormous number of different possible flavors.
There are two different approaches to simulate taste. Electric taste simulation and 3D printing of food and other substances.
One of the world’s leading researchers on electric taste stimulation is Nimesha Ranasinghe. For his PhD research at University of Singapore he studied and applied electrode-embedded chopsticks to stimulate different tastes. Lick a “Virtual Lollipop” and it might be sour, sweet, salty or bitter. More recently, his team at the University of Maine’s Multisensory Interactive Media (MIM) Lab created the “Vocktail”. The drinker of this “Vocktail” can control the sourness or saltiness of the drink in the glass with the electrodes along the edge of the glass, can add different scents like chocolate, mint, strawberry or banana, and can change the color with the LED under the glass. Users could create a sour, green-colored mint mojito or a salty-sour red-colored strawberry margarita.. The most important application of this taste stimulation technology is the health angle, patients addicted to sugar or salty food can be helped to improve their diet by fooling the taste buds. The technology could also be applied to test different flavours before mass production of food and beverages and of course be applied in virtual reality applications. Star Trek fans might finally experience the taste of Romulan ale in the bar on board of the Starship Enterprise.
3D printing of food is still in its infancy. One example of a company that specializes in food printing is byFlow who has created a portable, easy to maintain and easy to use food printer that is applied in fancy restaurants to print unique decorations and garnishes or personalized chocolate bars and cookies. 3D food printing technology works with cartridges that must be replenished or swapped to print additional layers. A universal food replicator that can print any type of food recipe from a palette of molecules and make it indistinguishable from the real is still science fiction.
As we saw earlier for taste, true taste simulation must overcome several research and engineering challenges and the speed of progress is arguably even lower than for smell so we can’t predict when true taste immersion will happen.





