SVG Sit-Down: AES Takes on Audio’s ‘New Realities’
New subgroup has as its mission to advance the technology of sound for VR, AR, and mixed reality
“New realities” covers the gamut of application areas: film, games, music, communications, medicine, forensics, simulations, education, virtual tourism. The technical requirements cover obvious areas, such as spatial audio and synthesis, but extend beyond them into audio networking, semantic audio, perception, broadcast, and online delivery. There will therefore be a need for good cooperation, with existing AESTC Technical Committees providing leadership in related disciplines.
The group will initially operate under the Audio for Games Technical Committee. SVG sat down with the leader of the new group, Dr. Gavin Kearney, researcher and lecturer at the University of York, and Michael Kelly, Vice-Chair, AESTC, to discuss the implications for broadcast audio in the VR/AR environment.
Microphones have followed cameras into racecar cockpits and onto athletes themselves. These are areas in which VR/AR can take viewers deeper. What’s the nature of the audio that will have to accompany them into a virtual version of those experiences?
Gavin Kearney: Being able to use VR to give yourself the viewing perspective from inside a sports car or from within a playing field is a paradigm shift in sports spectacle. But the sound design for VR requires more than just being realistic; it needs to be hyper-real. Perhaps the end user wants more crowd sounds or more music or to spatially place the commentary at a particular position around them. Object-based audio broadcast can give the end user the power to change the soundtrack to suit the desired experience.
It comes as no surprise then that the [capture], sound design, and mixing of such events is more challenging than ever. The sound engineer has to skillfully work with and group combinations of dedicated ambient soundfield microphones, spot microphones, and traditional soundtrack elements — such as Foley and music — and transform them into an active VR audio mix in 3D space.
Since the majority of VR audio is presented over headphones, it has to be able to fully render the soundfield naturally in three dimensions and to be readily translated so that, when the viewer moves their head, the soundfield is counter-rotated to ensure stable sound-source placement around the viewer. There are three key aspects here: excellent head-tracking provided by the VR headset, good sound recording and sound design, and good sets of binaural filters that deliver spatially and tonally excellent audio quality.
Michael Kelly: VR for sports faces many of the same challenges as many other live applications (capture, mixing, recording, delivery), audio or otherwise. The challenge with sports is that these are often at their extremes: such as how would you deal with potential nausea of placing someone in a live racing car when they’re … at home on the sofa?
But there are additional specific audio challenges: for example, miking a rally car to give creative control of the mix as the listener would expect to hear it is no easy task. It’s not a new challenge. Game-sound designers have a lot of experience in this, but that’s about re-creating a live experience rather than capturing a current one. A big question in this domain is how real does the listener expect it to be? Can we use dramatic license to create a better experience by mixing the real event with Foley?
Will Dolby Atmos and MPEG-H play a role in this?
GK: The AES is committed to seeing outstanding technical delivery of VR audio at all levels, and VR/AR for sports broadcast is no exception. MPEG-H has become an important technology for the delivery of object- and scene-based audio and will certainly permeate sports broadcast. Similarly, we’ll no doubt see further technical developments from companies like Dolby in supporting VR audio for a range of applications.
But what’s also interesting is the adoption of older technologies, such as Ambisonic surround sound. It’s easy to rotate, tilt, and tumble a binaurally rendered Ambisonic mix to counter head movements, resulting in stable source positions to match the VR visuals. As a technology that was first conceived as a competitor in the domestic Quadraphonic era, it is now fully established in VR and AR alongside object-based audio, through the support of MPEG-H.
MK: I think so. Atmos, DTS:X, and MPEG-H were conceived before VR, but the approach they take is highly applicable to VR, and, in many ways, this will totally unlock them. We are also aware of the role of Ambisonics as another highly applicable and much older format that has found its role in VR. Ultimately, these approaches all have individual drawbacks, and it will be interesting to see them evolve along with the way people use them. This is a core focus of the group.
What are the technical challenges AES researchers face in this pursuit?
GK: There are challenges at all levels. On the content-creation side, we are looking at new recording methods, workflows, and mixing environments in the VR space. In particular, how should sound designers be using immersive 360 audio for VR and AR to not only enhance the immersive sporting experience to give a feeling of “you are there” but also to augment it and give added value to using VR for sports?
On the reproduction side, achieving superb-quality binaural rendering is a high priority but can be difficult to achieve. Should content creators utilize their own personalized binaural filters, or generic filters when mixing? Are there standard binaural virtualization settings that we should use for the majority of listeners? How spatially accurate does the soundfield have to be in the presence of high-resolution immersive video?
The overall end-user experience is key. A core aspect is ensuring that the experience is not only immersive but a positive one. A caveat is that a first-person point of view, such as from an athlete’s perspective may work well when reproduced over 2D or even 3D static visual displays. But, when the only frame of reference is within the VR experience itself, we need to be careful that users don’t experience motion sickness, which can happen if they are not fully in control of the motion.
What are some of the processes AES will use in this effort?
MK: We set up the group with the fundamental question of “what are the right questions to ask”; there are so many perspectives. Part of the process will be engaging stakeholders in all the right areas, including sports. Then we can be clear about what each area shares with other areas, but also what makes them different. When we have the right stakeholders with the right questions, we can move towards the answers, though I’m sure the questions will change along the way.
GK: The new subgroup of the Audio for Games Technical committee will engage not only leading industry figures in audio for new realities but also leading academics whose work crosses interdisciplinary boundaries of technical production and creative sound design. This working group will, for the first time, seek to define workflows in audio for VR and AR and produce recommendations for content creation across a wide variety of disciplines, including sports broadcast.