Immersive Audio: The Pieces Are Coming Together
In sports, the evidence of immersiveness is everywhere, from VR goggles for looking at content available only through a venue’s WiFi to massive 360-degree “halo” video displays, such as the one currently being fabricated for the Mercedes-Benz Stadium, home of the Atlanta Falcons and Atlanta United FC.
Immersiveness will be broadcast audio’s latest inflection point, after stereo in the 1990s and 5.1 surround a decade ago. Earlier this year, the ATSC’s 3.0 standards project announced both Dolby AC-4 and MPEG-H as candidate standards for ATSC 3.0; more recently, the North American Broadcasters Association (NABA) board approved the Dolby AC-4 system as its recommendation for the next-generation audio standard for NABA’s membership (in the U.S., Canada, and Mexico), the sonic counterpart to the 4K/UHD that will be its video format.
AC-4 can support up to 11 channels of audio, including two to four on a new axis: height. It also supports object-based audio, which represents sound as a set of individual assets along with metadata describing their relationships and associations. This radical new approach to sound not only means that audio elements can be placed in a 3D space but also offers the potential for allowing home viewers to rearrange the relationships of those elements, adding personalization to audio’s new frontier.
Nuts and Bolts
But the details for making that work are still in progress. Take, for instance, monitoring the audio for an 11-channel production. It would be almost impossible to set up 11 discrete audio channels in the space usually reserved for audio on remote-production trucks, which have just about completed a years-long industry-wide upgrade to accommodate the six channels of 5.1 surround sound.
“There’s no space in the truck [for the monitoring array] that you would need beyond 5.1,” observes Chris Fichera, VP, Group One, whose product offerings include Blue Sky speakers and Calrec and DiGiCo audio consoles. “But we have seen it starting to show up in [broadcast] flypacks.”
He adds that Blue Sky is working on developing modular 7.1.4 (the typical five-channel LCR and stereo rear arrays plus two stereo side and two overhead channels, plus an LFE channel) monitoring arrays that could be scaled up or down, depending on the size and needs of the mixing environment. The expectation is that this will eventually evolve into a scalable product line, based on Blue Sky’s MediaDesk line, allowing someone with multichannel monitoring needs — videogaming as well as conventional sports — to buy a basic system and expand it as demand in broadcast and other vertical markets increases.
The real challenge will not necessarily be in the number of speakers in a system, says Fichera, but in managing them. Blue Sky’s AMC speaker-control system will be able to be scaled to adapt to larger speaker arrays. Speaker management has been an area of development in recent years for a number of manufacturers, including JBL and Genelec, two other brands often found in remote and plant broadcast facilities. “With the number of speakers needed for immersive broadcast audio,” he says, “you’ll need systems that can handle room optimization and other processes automatically.”
Microphones have been following a similar track on the sound-capture side of the equation. Longtime entrants in the multichannel transducer sweepstakes, like Holophone and Soundfield, are being augmented by new products, such as Sennheiser’s Ambeo VR microphone and Schoeps’s ORTF-3D plug-and-play array.
Brian Glasscock, user experience researcher at Sennheiser’s Strategic Innovation division, describes the Ambeo multichannel capture system, which will be available for sale in the U.S. this month, as having been designed with input from members of Sennheiser’s Creators Program, which includes several national television networks that broadcast sports programming. He says he can not reveal which ones they are, citing NDAs, but notes that they contributed to Ambeo’s design by emphasizing the need for a robust microphone that can stand up to outdoor applications and has a widely used interface to accommodate broadcast workflows. In this case, that meant fitting the Ambeo’s four mic elements with XLR-type outputs jacks.
“Users looking for immersive sound, for VR and next-generation broadcast applications, wanted both of those criteria met,” he explains. “A number of spherical-capture microphones are delicate or even flimsy. That won’t work for broadcast and especially not for sports.”
Glasscock stresses that immersive audio capture will have to work in concert with conventional audio techniques and tools. “The shotgun microphone isn’t going to go away, because the need to capture specific sounds isn’t going to go away,” he says, adding, “VR and other immersive applications will mark a major shift in broadcast sports. We’re in the equivalent of early days of film with this. We’re trying to figure out the techniques and tools we’re going to need for storytelling in the immersive age.”
Although Dolby has developed the Atmos and AC-4 format, it will rely on a number of development partners for AC-4 implementation. Harmonic’s Harmonic Electra X2 AC-4 encoder, for example, was used as part of the first live ATSC 3.0 broadcast of a major professional sports event, by Fox affiliate WJW-TV Cleveland during the opening game of the World Series. (The FCC granted an experimental license to operate a full-power Channel 31 transmitter as a laboratory for broadcasters and manufacturers creating the Next Gen TV service. Says Richard Friedel, 2016 chairman of the Advanced Television Systems Committee, “This is a defining moment for the future of television.”)
“We rely on a broad set of partners across the ecosystem,” says Jeffrey Riedmiller, VP, Sound Group, Dolby Labs. “Harmonic is a great example. We provide them with the core technology ingredients and know-how to make sure they can apply that consistently and with the ability to achieve scalability.”
He adds that broadcasters will embrace AC-4 in a staggered fashion, with networks applying it at different paces to 5.1 and then to larger speaker arrays.
He describes the broadcast of the World Series game from Cleveland as a very basic application of ATSC 3.0 audio. “In this first live ATSC 3.0 trial, Fox, Tribune, LG, Harmonic, and others brought all necessary pieces together and broadcasted in ATSC 1.0 and ATSC 3.0,” he says. “They set up the pieces like the encoders, the modulation equipment, and the antennas and made sure they all talked to each other.”
Riedmiller is also seeing manufacturers of related equipment, such as speakers, beginning to look to an immersive market in their new products, with the ATSC 3.0 testing on Next-Generation Audio as an impetus. “We’re starting to see momentum in that regard now,” he says. “We’re seeing products like loudness-measurement and -management products starting to figure out how to apply and, eventually, automate those functions in an object-based environment.”
And while ATSC 3.0 will be a more complex landscape, it may also see a faster uptake than 5.1 because more of broadcast audio’s infrastructure is working in concert. “This time around, we have the ATSC, the ITU, the academic community, and manufacturers all working together,” he explains. “We’re all trying to get ahead of the curve, before the chaos sets in.”
Robert Bleidt, GM, audio and multimedia division, Fraunhofer USA Digital Media Technologies, developer of the MPEG-H format, which has been named the format of choice for South Korea’s broadcast system and will be used for the 2018 Winter Olympics, emphasizes that the industry is at the very beginning stages of immersive sound.
“Not just of mixing but also of sound design,” he says. “Everything will have to be rethought, including things like microphone placement to take advantage of the height channels.”
He adds that automation built into the MPEG-H format design will allow it to properly downmix, a key function during the transition period for immersive audio. “There’s a lot of work ahead, but, with what immersive audio can do for sports in particular — filling in the missing audio cues that let viewers more easily suspend disbelief — it will definitely be worth it.”