Mastery through sounds: the challenge of Overwatch
With a Metacritic score of 90, more than 269 million dollars sales just in May 2016 (with the game being released on the 24th) and 10 millions of active players passed, Blizzard’s Overwatch has been one of the biggest releases of last summer.
Overwatch is a 6 vs. 6 arena, objective-based first person shooter, where the core of the game-play is the variety of “Heroes” (21) that can be chosen, each one characterized by different statistics and three different cool-down skills (two “basics”, and one more powerful, called ultimate).
Building audio for a game like this means giving the right feedback to the player, providing all the sonic information needed to identify and localize threats around him (Collins, 2010, p.130; Marks 2002, p. 221). The audio team at Blizzard was conscious of this necessity, especially when Jeff Kaplan, the game director, told them “I want to be able to play the game practically with the monitor turned off” (Lawlor et al., 2016).
To accomplish this request, the team worked in order to help the player obtaining the following informations from sounds:
1. Know what is the biggest threat around you
2. Know from where it is coming
3. Know which hero is threatening you
Know what is the biggest threat
In Overwatch the situation can become difficult to handle really fast: twelve players can be really close one to each other while shooting, using different abilities and deploying other elements, like turrets, on the field.
Instead of being helpful, this situation could become blurred and disorientating from a sonic perspective. In order to provide efficient audio, the team has worked on two points: the way the mix is handled, and an effective informative hero voice over.
A clear mix: the importance system
One of the key aspects of Overwatch’s audio is the way sounds are mixed. Instead of managing them following what would be the real world physics, in which “the louder sound wins” (Lawlor et al.,2016), the mix is based on game-play priorities, using what has been called an “importance system”.
The system creates “threat levels” looking at things like the enemy size on the screen of the player, the player size on the screen of other players (useful also to know if the player is being scoped by another one) and who is shooting at who.
In picture 1, the threat level system can be analyzed. The player is the blue arrow, while the green arrows are two enemies in a different section of the map, so they have a low threat level. The two yellow arrows are two close enemies looking at the player, while the red one is more far from him, but its aiming and damaging, so it gets an higher threat level. By sending this values into Wwise, each sound effect can be modulated following the threat level assigned.
The problem is that with 6 enemies players and 6 allies, plus turrets and other items that some heroes could deploy on the field, the situation becomes really confused if 4 or 5 of them reach high levels of
threat in the same moment.
For this reason, the threat level was sort into 4 groups (or buckets). In the first one, the enemy with the highest threat is included, while in the second one the following two. The following 4 to 10 elements on the field, including allies, belong to the third group, while in the fourth goes the rest.
Into WWise a real time parameter curve (RTPC) is drawn for priority, volume, high pass filter and also pitch, following the bucket index. In picture 2, the RTPC curve of the make up gain of the enemies' weapon fire sound is shown. If the enemy belongs to the the first bucket (red arrow), its weapon fire will be seven decibels louder than if it was in the third one. The same concept applies to pitch, high pass filter, low pass filter and other values.
Each sound, then, has its own specific RTPC curve. Ultimate abilities, for example, have a more steady line with an higher volume, because designers wanted players always to be able to hear them coming.
In the following video, the difference between the same assets in Bucket 1 or 3 can be heard. In the first case the sounds of footsteps, foley and weapon is louder, has more low end and is pitched slightly higher. In the second portion of the video (where the character represents an ally, so is put in bucket three), the sounds not only have a drop in volume, but are pitched down and an high pass filter is applied too. In this way, the mix is clear also in case of crowded situations and when the same hero is chosen by both teams.
Informative Hero VO
To help players communicate and to characterize more the Heroes, an informative voice over that reacts to different stimuli in the game was created. The VO has been built in order to make each player hear what is important for him, as shown in picture 3.
Each phrase is broadcasted differently to the players. A jump sound, for example, will only be heard by the player, while the Rocket Barrage, which is an ultimate ability, will be heard by all of them, enemies and allies, and so on. In the case of ultimates calls, the team implemented two different versions of it, one for the allies and one for the enemies, so that players will know whether they are in danger or not.
Know from where it is coming: Pinpoint accuracy
After sorting out the priority of sounds, an efficient way to localize them was needed, and,
also in this case, the game features innovative ways to do it.
The first feature of the localization system is how sound obstruction and occlusion were handled: the dedicated tool of Wwise wasn't giving good results. To solve this problem, the team has took advantage of the AI system developed by the programmers of the flying paths of enemies: in this way they could know which path a sound had to take to reach the listener. Picture 4 shows how the system works.
A raycast is emitted from the sound to the player. If it is blocked by an obstacle, the path the sound should take to reach the listener is calculated and the difference between it and the raycast is given in percentage as "path diversion" (e.g if the raycast is of 20 meters but the effective path of 26, path diversion is 30%, because 6 meters, the difference, is the 30% of 20). An RTPC curve is then mapped for the path diversion parameter modifying volume, filters, reverb and delay sends. This method not only allows greater control rather than occlusion and obstruction, but also custom curves for each type of sound.
In the following video the entire occlusion system can be heard when the player hides inside the building.
Notice how the phrase “It’s high noon” cuts through the mix: this is because it’s one of the ultimate abilities and the sound designers wanted them always to sound clear for everyone. Also, it can be seen that the enemies have a sort of red or yellow "aura" around them. That is the importance system in work, that continuously evluates which enemy is more dangerous.
This system adds an additional layer in mixing the sounds in term of danger, because it adds a sort of consciousness to the obstruction/occlusion concept. If an enemy is near in terms of raycast, but would have to take a long path to reach the player, his sound will have less volume, because it's not a threat, and vice versa.
The quad-delay plugin
Another key element of the localization system is a custom made quad-delay plugin for Wwise, built to imitate how sound is reflected by surfaces in the real world. As can be seen in picture 5, four lines are traced from the player along its diagonals, like if they were pointing to the speakers of a 5.1 system.
The distance from the first obstacle encountered by each line is sent to the quad delay. The plugin has 4 multi-taps delay channels (one for each speaker of a 5.1 system) that correspond to each line traced in the game, and are constantly modulated by the value returned. The nearer the surface, the shorter and brighter the delay from that direction will be (each line has a dedicated parametric eq.), the opposite for far surfaces.
In this video the delay system can be heard first without and then with the dry signal, while the player moves from more closed to more open settings.
Know which hero is threatening you: characterization
The characterization of the 21 Heroes present in the roster is a key aspect of the game ("Overwatch does a great many things well, but above all else, its success is built on the backs of its many excellent characters." Ingenito, 2016) and this has to be reflected on the sound design: after making the players know what the threat was and from where it was coming, the next step for the team was making them know actually who the threat was.
Each hero is distinguished by a proper set of sounds, which goes from the weapons, to his voice, to the footseps. In the following video, it can be seen how each character can be recognized just by the sound of its footsteps.
A dynamic path to mastery
The choices made by the sound design team of Overwatch really influence the players and the experience, and could have represented a big pitfall for the game in terms of immersion and gameplay if not calculated carefully.
A mix which is dynamically dictated by a threat level could indeed become annoying in a fast game like this, leading to constant volume jumps due to the characters moving from one "bucket" to another.
But this is not the case: the audio system in overwatch is built with great care and guides the players, helping them to achieve mastery and expertise, without sounding detached from the visual experience and enhancing immersion (“the emotional involvement of what is happening (Grau, quoted in Collins, 2008, p.133) too.
Finally, the multitude of videos (1 2 3), articles and posts (1 2) made by the fans, that explain how to become better at the game by listening to the sounds demonstrate how these aspects have been perceived and appreciated by the community.
Collins, K. (2008) Game Sound: an introduction to the history, theory, and practice of video game music and sound design. MIT Press.
Ingenito, V. (2016) Overwatch Review [Online]. Available from: <http://uk.ign.com/articles/2016/05/28/overwatch-review>
Grau, O. (2003) Virtual Art: from Illusion to immersion. MIT Press. Quoted in Collins, K. (2008) Game Sound: an introduction to the history, theory, and practice of video game music and sound design. MIT Press.
Lawlor et. al (2016) The Elusive Goal: Play by sound. In: Game Developer Conference 2016 March 14-18 2016, San Francisco, CA (link)
Marks, A. (2009) The Complete Guide to Game Audio: For Composers, Musicians, Sound Designers, Game Developers. Burlington: Focal Press.
Pictures 1 to 5 from Lawlor et. al (2016) The Elusive Goal: Play by sound. In: Game Developer Conference 2016 March 14-18 2016, San Francisco, CA (link)