SVG Summit Recap: DTV Audio Group Workshop Explores Audio Production & Distribution in the Modern Age
Like much of the rest of this year, the annual DTV Audio Group event (Dec. 13) at The SVG Summit in the New York Hilton (Dec. 13-14) was smaller and briefer, but by no means reduced. Backed by SVG audio sponsors Brainstorm, Calrec/Digico, Dale Pro Audio, Dolby, Lawo, Sanken, Shure, and Telos Alliance, it was also more narrowly focused, this year on premium audio production and distribution, with an emphasis on Dolby’s Atmos format (the premium part) and the imminent arrival of 5G (the distribution part), as well as the looming presence of AI, and how all might impact television audio going forward.
Specifically, sessions included: the Premium Audio Challenge, Efficiently Scaling-Up Atmos and Accessible Audio Production and Distribution; Streamlining Edit Workflows to Support Live Production in Atmos; Applying AI to Live Immersive Audio Production; and New Spectrum Challenges and Opportunities.
After an inaugural observation by DTVAG Executive Director Roger Charlesworth that, “The theme today is simplifying Atmos productions,” Tom Sahara, past chairman of both SVG and DTVAG industry working groups and now an industry consultant, established context by noting how consumers, working largely from home during much of the pandemic, had become aware how poor audio quality could negatively impact their productivity and workflows, and conversely noticed how good audio had now become, with the arrival of premium products like soundbars and earbuds. Noting increased home theater system and smart speaker sales in 2020, as well as the embrace by consumers of smart TVs and more OTT content, Sahara — who was also inducted into the Sports Broadcasting Hall of Fame during the SVG Summit — asserted that, “Audio has moved to the forefront as the pandemic put a premium on high-res sound and the products for it.”
That, he continued, along with immersive audio formats such as Atmos, would have implications for broadcast and streamed live sports. Further, noting the rapid deployment of UHD video for major broadcast sports events, the foundation was there for both immersive audio formats and higher audio quality.
“The pandemic changed consumer expectations [for audio],” he told an audience of 50-plus attendees. “What that means to us as part of the audio community is that it’s up to us to find a way to deliver the best-possible audio experience to consumers. They’ve already made the investment in good sound. We have to push harder to deliver that.”
Elements of future premium sports-media audio will include Dolby’s Atmos, soundbars capable of reproducing immersive sound without the need for multiple speakers, and voice-activated smart speakers, which will bring a heightened level of interactivity to the experience.
A Look And Listen At The Olympics
Sahara’s presentation was followed by an in-depth and technically detailed description of how Atmos was scaled up by NBC Sports for its Tokyo Olympics broadcast last summer, and how that represented a test run of what will be a similar effort for the upcoming Winter Olympics. The presentation outlined the overarching goal: perfectly phase-aligned 10-channel immersive surround sound accompanied by 4K-UHD video sent to the NBC plant in Englewood, CA, as well as an 8-channel distribution of an HD network-to-TV-station feed. Outlined was how the production team pioneered metadata-stripped, high-bitrate distribution version of AC-4, configured as “clean piece of wire.”
It was also pointed out that there were enough crew and other team members in the event facilities to negate the need for contingency plans to generate augmented crowd sounds in Tokyo, where fan attendance had been prohibited due to Covid. In addition, the OBS-supplied 5.1.4 ambience audio from all of the various venues was further enhanced by additional NBC-supplied microphones used to sweeten the .4 height speakers.
Attribution was limited in this presentation, but one presenter’s comment that, “It can get technologically overwhelming,” summed up what was a huge effort that would presage another equally enormous undertaking scheduled for February in Beijing.
Channel Vs. Object
In the presentation entitled “Streamlining Edit Workflows to Support Live Production in Atmos,” Brian Rio, Director of Creative Services for Warner Media Studios, pointed out that “Creating quick-turn, HDR, Atmos content in post [production] doesn’t have to be that complicated,” thanks to the “beautiful simplicity of channel-based audio” versus object-based audio workflows. “We need to keep elevating the workflows” as the industry moves further into the UHD/Atmos ecosystem, he added.
Co-presenter Lane Crouse, Senior Sound Designer and Tech Workflow Lead at Warner Media Studios, added some interesting specifications: that fades between overhead and horizontal zones cannot presently be done when mixing immersive audio in Avid’s Pro Tools system, a limitation he said Avid is working on resolving. Instead, the downmix coefficient between upper and horizontal planes from 5.1.4 to 5.1 is a 6-dB attenuation, and a 3-dB reduction from 5.1 to stereo.
A presentation by Rob Oldfield, co-founder and CEO of Salsa Sound, which provided the systems for augmented crowd sounds for several broadcast sports during the empty-venue months of the pandemic, looked at the company’s AI-based automixing system for sports on television. The system, which utilizes machine learning to recognize key sports sound effects, is intended for use as submix support for a broadcast’s main mix — a function, Oldfield emphasized, that will be increasingly necessary as broadcasts must generate multiple mixes to accommodate multiple audio formats. AI can, for instance, “hear” a tennis racket strike and then tag that sound file, using its metadata to know when and how much to apply it in a submix when it encounters it real time during a match, with a 40-ms “look-ahead” delay applied to the overall broadcast, to allow the processor time to react.
“Microphones should be considered sensors, versus being regarded as capture devices,” Oldfield said, adding that the processor can also be programmed to distinguish desired effects sound from interference such as wind noise. “Using AI, we can make that into data.”
Welcome to the future.