In the last couple of years, Virtual Reality has shifted from an high-tech experimental concept to a product accessible to almost everyone. Oculus Rift and HTC Vive have been commercialized and while they have considerable cost and needs an High Tech PC to Run, the launch of Playstation VR has offered a considerably cheaper option. Producers like Google, LG, Samsung and One Plus have then offered even cheaper mobile based VR, while new “all-in-one” products that don’t even need an external hardware support, like the Sulon Q, are going to be launched in 2016.
In this scenario, the video game industry is moving towards experiences where players will be totally immersed in the game world: in this blog post I am going to talk about the challenges of creating a believable acoustic space in VR experiences, and what has been discovered so far to achieve this result. It isn’t meant to be an exhaustive article, and it couldn’t be. VR audio is an (almost) new field characterized by experimentation and “trial and error” approaches. (Fuesslin et al., 2016; Gumbleton, 2016; Smurdon T., 2015)
Figure 1: Back to dinosaur Island Part 2, Crytek, 2016
A new dimension: 3d audio
Thanks to the combination of head tracking control and Head Related Transfer Functions (HRTF), sounds in VR can be localized specifically, with the possibility of being placed on a new plain: the vertical one (or the z axis). New attention needs to be given to sound placement: while most of the audio related to NPCs, for example, is normally attached to their root, there is now need for more precise positioning: immersion can be broken pretty quick if dialogues are coming from the feet of a character.
Tom Smurdon (audio content lead at Oculus) has also found out that the vertical placement of sounds can influence how the player perceives height and his own weight (2015). Smurdon noticed that putting water sounds in an underwater environment over the head of players was going to make them feel more heavy and viceversa. In an environment where player where on the top of a skyscraper, putting traffic sounds below them helped in giving the feeling of height, reinforcing the visuals.
The problem of HRTF so far is that curves are different for everyone, so there could be people able to localize things more easier than other. To help players, it has been noticed that long and moving sounds can be localized better. (Hook, 2014; Kuzminsky, 2016)
Return to mono
In this scenario where a lot of sound filtering happens following the player movements, area loops are an obstacle to immersion, because they will give the feeling of being stuck to the head of the player (a problem faced also when dealing with music in VR experiences) (Fuesslin et al., 2016; Gumbleton, 2016; Hook, 2104; Smurdon, 2015).
Placement of mono, preferably anechoic, sounds with procedural approaches (intended as playback systems with random combinations of positioning, pitch, layers and other elements of the dounds) are preferable. (Gumbleton, 2016). These are already already present in video games, but VR seems to claim more of them to achieve a good level of immersion.
Audio teams have found out that a less compressed environment is more believable and involving for the players (Fuesslin et al. 2016). Volume increase of important sounds and falloff curves that follow gameplay are preferable to ducking, but this last can be used more slightly on multiple sources, following gameplay, to emulate what’s called the “Cocktail Party Effect” (which is the ability of our brain to focus on the audio source we are interested, filtering out other sounds) (Gumbleton, 2016). This is an approach to audio similar to the one of Overwatch, where the sound elements are constantly classified and mixed in order to give the player just the sonic informations he needs to localize his threat. Situations in VR can become really heavy from a sonic point of view, so dynamic mixing of audio can help guide the player, but it has to be made in a believable way, to not break immersion.
To be considered then, is that the use of closed or in ear headphones can isolate the player from the external world, bringing to a lower noise floor than the one present in the average living or bed rooms. In this way, a greater dynamic range can be used, with less need of ducking. (Fuesslin et al. 2016)
Spatialization and reverberation
Reverberation helps the player in recognizing the dimensions of the setting and also the placement of objects inside a room. Thanks to the lapse of time between a sound source and the early reflections of the reverb, the brain is able to localize its proximity to reflecting surfaces. For this reason a fixed reverb won’t suit the virtual reality domain.
Oculus SDK responded to this problem with the shoebox model: a “box” in which early reflections for each side can be specified and that will respond dynamically following the sound emitters’ placement.
Two big ears’ 3dCeption offers a similar solution, generating binaural reflections based on geometry.
Nvidia then is offering an innovative approach to this problem in its VRWorks API, based on ray tracing.
Ray tracing is a common technology used to simulate the path of light in a certain environment, and in VRWorks audio is used by the GPU to understand the environment surrounding sound emitters and the player (distance from wall, dimension of rooms, materials) and to modulate the ambience with proper echo and early reflections dynamically.
Using GPU power, the technology promises to offer a CPU-cheaper way to handle 3d audio positioning, since “simulating 3D audio effects…[is a] secondary effects of modeling the full audio propagation of 3D space… so when the sound propagation is modeled correctly these effects are captured automatically”. (Nvidia, 2016)
Immersion in VR requires a new consideration of the acoustic space inside video games, but this come at a cost. Different engineers agree on giving 3d spatialization just to the most important sounds while leaving the other in 2d (Fuesslin et al., 2016; Gumbleton, 2016; Hook, 2014), and that the sound doesn’t need to be extremely realistic, but consistent, to achieve immersion (Gumbleton, 2016). More work is being done by the way to reduce this cost, like the new NVidia technology or the development by oculus of a dedicated DAC for its platform inside the SDK (Hook, 2014).
More procedural technologies, like VRWorks Audio, are then a good answer to the increased amount of work that audio implementation in VR requires (Smurdon, 2015), and a good enhancement to the workflow in a field where there aren’t really standardized approaches and sharing works and projects is still difficult. (Beaudoin et al., 2016)
Beaudoin et al. (2016) Audio for cinematic VR. In: Virtual Reality Developer Conference 2016 March 14-18 2016, San Francisco, CA (link)
Fuesslin et al. (2016) Virtual Reality and Real Audio. In: Virtual Reality Developer Conference 2016 March 14-18 2016, San Francisco, CA (link)
Gumbleton S. (2016) Audio for AAA Virtual Reality Experiences. In: Game Developer Conference 2016 March 14-18 2016, San Francisco, CA (link)
Hook B. (2014) Introduction to audio in VR. In: Oculus Connect (link https://www.youtube.com/watch?v=kBBuuvEP5Z4)
Kuzminsky A. (2016) VR Audio: Trends and Challenges of Pioneering a New Field [Online] Available from<http://designingsound.org/2016/08/vr-audio-trends-and-challenges-of-pioneering-a-new-field/> [Accessed October 26th 2016]
Nvidia (2016) NVIDIA VRWorks™ [Online]. Available from <https://developer.nvidia.com/vrworks> [Accessed October 26th 2016]
Nvidia (2016) NVIDIA VRWorks Audio [Online Video]. 9th May 2016. Available from <http://www.youtube.com> [Accessed October 26th 2016]
Two Big Ears (2013) 3DCeption [Software]. Available from <http://www.twobigears.com/about.php> [Accessed October 26th 2016]
Smurdon T. (2015) 3D Audio: Designing Sounds for VR. In: Oculus Connect 2 (link https://www.youtube.com/watch?v=IAwFN9sFcso)