Live From Tokyo Olympics: Karl Malone on the Move to Immersive Audio and Keeping Remote Commentators Connected
Karl Malone, NBC Sports and NBC Olympics, director, sound design, says the 2020 Olympics is truly the audio Olympics. He jokes that he says that every Olympics, but it’s hard to argue with that this time around: next-generation immersive audio, with 5.1.4 (think regular surround sound with four additional channels overhead to give height to the audio), is the norm at every venue. And remote commentary is more important than ever. Simply put, the efforts are impressive. He spoke with SVG in the NBC Olympics listening room at the Tokyo IBC.
I wanted to start with your sense of the sonic landscape of these Games. Does anything stand out to you?
Going into this, the focus was a lack of crowd and how things would sound, as well as coming in with the newer technologies of immersive audio, along with OBS providing 5.1.4 audio and a paintbrush to do all our venues in immersive audio. But the lack of crowd has tended not to be an issue because we’ve had quite a lot of crowds, especially in swimming and gymnastics, where the other teams come in, and that has been great. That’s not to say that there are no empty-sounding venues, like badminton or weightlifting, but, certainly for the primary venues, it. has become sort of a non-issue.
Michael DiCrescenzo, NBC senior A1 and audio design engineer, has been mixing the NBC Primetime show in Dolby Atmos and creating the immersive mixes to ensure consistency from sport to sport. Peter Puglisi has been mixing NBC golf coverage in Dolby Atmos but out of a truck in Stamford, CT. The technical complexity of this Games has been dizzying.
Sonically, it has been kind of the best of both worlds, where you have crowd and you can pick out the detail of the sport. This was always going to be the Games that was an audio Games, and you’re going to hear these Games like you’ve never heard before. We can hear footfalls in athletics, and that’s not in a silent stadium: there is the PA, and there are people cheering. Everything takes a day or two to tweak, but you’re hearing things that you’ve never heard before, and that has been fantastic.
Athletics, golf, gymnastics, basketball, volleyball mixed from Sky in London and swimming and beach volleyball have all sounded fantastic. The specialty cameras have helped a lot. There’s a cable cam that goes over the swimming, and that really helps to pick up the sound, the “wet” sound of swimming.
I loved the skateboarding where you could hear the board riding down the rail.
Yeah, skateboard is good, but we’re sort of challenged a little bit by a couple of sports like BMX. The ramps are soft, and the rubber wheels are soft, so you’re trying to reach for those sorts of sounds. In 3×3 basketball, the court is different from regular basketball courts, where you hear squeaks and ball bounces, so you need to try to pull those [sounds] out as well. With those new sports, you want to promote that sport to people audibly as well as visually.
It has been challenging, but that’s the fun part: how can we get more out of this sport and bring it home to tell the story?
Any tips for someone who has not worked in immersive audio about what to expect and how to do it?
We’ve always talked about having a base layer of ambience for the heights. I think that has always been very successful with us, and that’s what OBS has provided us with: a nice base layer to build on.
We’ve gone 16 channels wide with our edits and audio coming from the truck, and we came into this knowing that’s our plan. Sixteen channels of audio gave us the first eight channels for our standard 5.1, plus we do dummy headsets and clean announcer tracks to help edits. The last eight channels are fully immersive: we take the four height channels, and we have a stereo mix for the last two pairs, which we’ve been able to place into certain areas of the stadiums to add to that base.
For example, we would use those pairs to isolate a certain section of fans in the crowd and put that into the heights with the base. That hasn’t been borne out at these Olympics, but, technically and engineering-wise, we’re getting that through the edits, and that has been very successful. Making that 16-channel workflow work and getting editors to edit across 16 channels is something we’ve never done before. And that has been very helpful to us.
That may be how everybody else will do it, but, for us, it is having the base layer and then building on that base layer by being able to add certain things to bring your ear up.
When someone is editing with 16 channels, are they hearing the height signals?
No, they are not hearing them in an immersive overhead configuration but rather as isolated tracks, which they can solo and QC. Ultimately, they’re passing them through from the trucks or the venues. The editors are doing their normal 5.1, and everything’s mixed live in the audio-control room for the final mix.
Final question on the immersive land. What does it mean to see spatial audio getting its due, even seeing commercials on TV from Apple about it?
I think anything that can add to the experiences is going to be worthwhile. I know I say this all the time, but immersive audio does make the picture look better. If you add spatial audio and the immersive audio to our production, it makes the production overall a much better and more enjoyable production. And it’s not that the pictures don’t look great; they look fantastic in HDR. But it’s all part of that package that you’re providing people: the best-quality audio and the best-quality video give them the best product possible.
Let’s talk commentary. There is a lot of remote commentary going on for NBC around the world. First, audio delay is always a concern, so where you are with that?
It’s probably about 120 ms. That’s really nothing in a voiceover booth. But we have announcers and analysts for tennis, soccer, athletics, and baseball at Telemundo, Sky in London, here, or in Stamford working with a co-commentator in Tokyo. We have two spy cams so we can have play-by-play and analyst see each other from two different continents, and nobody would ever know that they are separated by continents. That’s the kind of magic behind the screen.
And we have four remote venues for indoor volleyball, basketball, golf, and beach volleyball, and those production crews are either at Sky or Stamford while commentary is here. It’s built on the LANCE Dante system and the Calrec RP1, which is a virtual console that allows mix-minuses to be done locally so there’s no latency for announcers. Everything is Dante-connected and embedded via MADI and then controlled by the Calrec consoles that are in either Stamford or Sky. That’s a seamless workflow, and nobody can tell where it is being produced from, with, once again, the trucks on different continents.
Looking forward to Beijing, is there anything you want to tweak?
We have definitely proved the success of the immersive-audio workflow between the editors and transmission, check-ins for the trucks, making sure the 16 channels pass. All we needed is the material with the crowds so we can immerse people in a venue, where right now we’re focusing on the athletes. But the crowd is the key component in capturing the passion that goes with sport, and that’s what we really look forward to.
What else strikes you about these Games?
I think the Friends and Family efforts have been fantastic. We have cameras and watch parties in the U.S., where the families can interact with their sons and daughters. It brings the Games home to the athletes’ families like never before (to see a clip, click here) and gives that emotion and passion that you sort of miss by not having the crowd.