Live From Tokyo Olympics: Darryl Jefferson and Jim Miles on NBC Olympics’ File-Based Workflows, Storytelling
Every NBC Olympic effort sees massive changes and advances with respect to file-based workflows, editing, and more. Toss in UHD, HDR, and immersive audio, and those advances are even more challenging and, ultimately, impressive. Darryl Jefferson, NBC Olympics, VP, broadcast operations and technology, and Jim Miles, NBC Olympics, director, digital workflow systems, discussed the multiple-continent, -time zone, and -facility effort with SVG.
Can you describe the ecosystem here?
Miles: It’s Avid Media Composer for craft editing, and Avid Interplay MAM is the record apparatus for highlight-shot selection as well as our archive. The entire Olympic archive is Interplay and Media Central, and our playback turnaround is EVS. We do a lot with Telestream for transcode, flipping, and orchestration, and we use Signiant as our file mover and for transfers from the venues and to Stamford.
Jefferson: We also have a new ingest device from Telestream called Live Capture, which can capture to 1080p HDR content as well as the older flavors of content. And our big monster storage is Dell/EMC Isilon.
Miles: The interesting story on the editors is that we are still using hard workstations for the primary craft edits but all our auxiliary edits are virtual machines. We used to have to bring 30 Avids to the IBC, but, this year, we had to bring only a dozen, and we have the VMs for the producers and those lighter-weight tasks. That has been huge for us in terms of the complexity of what we must build.
You have teams around the world diving into your file-based workflows. Are the workflows the same everywhere?
Miles: More or less. We try to put the high-resolution recording where it’s needed. If somebody is doing a turnaround at a venue, we’ll put it right there next to them in their local storage. If something is needed for primetime, we can move the content here to the IBC.
But our main record apparatus has moved back to Stamford, and the hundred feeds that are coming in from different venues, from the host, and from our own all go back to Stamford and are recorded there, where the bulk of our ancillary users are. Then there are other business units in Miami, in the news organization at 30 Rock that are pulling from that recording wall in Stamford.
Jefferson: The other thing is the added wrinkle of HDR for the primetime show and for our venues. That adds a layer to know which version of a recording you have and if we’re doing parallel recordings or is it an SDR sport contributing into an HDR primetime show or vice versa. We try to normalize the content for the end user so that complexity is obscured and people have to think about it less.
Miles: We have over well over a hundred different paths, many of which have file conversion or interlace-to-progressive or progressive-to-interlace conversion in the middle. It has been an interesting challenge to build it all ahead of time and then get it running in a matter of days.
Jefferson: It has been an education for our legion of freelance editors and freelance operators in general. They need to wrap their heads around where is the recording that is closest to the output of my show. Or, if someone’s delivering into a show that’s different from the format they’re normally cutting in, it takes a little while to get up to speed.
But we do have islands of 1080p HDR, like at the venues, and, once they’re at the venue, they have to worry much less about their environment. They only need to worry about the outliers, like an ENG camera, coming in or an interview or reaction shot from another broadcaster that won’t be in 1080p HDR.
Miles: Content+ is a great example where we have that fire hose of content coming in from OBS and, depending on who’s pulling it, some of them will just take it natively and some are taking it with a LUT [look-up table], or some are taking it with a transcode depending on where it’s going.
The other day, you mentioned having folders that would automate some processes. Can you elaborate on that?
Jefferson: The folders basically illustrate to us where the content is coming from, where it is going, and what process it undergoes. For instance, there’s a set that takes things from progressive to interlace, applies a LUT or up/downconversion, or normalizes audio and so forth. The folders are basically instructions of what is going to happen, and we can see at a glance what happened to the file and what direction it is flowing, etc.
And is there an undo in case someone gets it going the wrong direction?Miles: There’s not exactly an undo function for a LUT, for example, but we are able to go back and put it through the correct workflow. All the folders that Darryl mentioned are lashed into different production systems — like our EVS system, our Telestream Vantage system, our Avid system — and all the things coming from external sources, be it ENG or Content+ or drives. They all kind of flow together, and we don’t just throw things away after we process them; we hang on to them. If some inbound source comes in incorrectly from a partner, for instance, we could go back and get the original file and put it through a different process to get it the way we need it.
Over the past year, there has been a lot of talk in the industry about how editors, for example, don’t need to be onsite. What do you see as the benefit of having editors at the A venues — swimming, gymnastics, athletics — and at the primetime set?
Jefferson: Having folks in the city and venue where the event is happening lets them feel the pulse of the Games, and they can interact with the players or the coaches. In some cases now, we bring those athletes and families together with our technology, and that’s a bonus.
But, for every editor that’s here, we have probably five or six in Stamford. Our onsite complement is down substantially from Games past, but you do need a handful of editors that can do the fast turnarounds, keep up with the Games, and cut up stories and inbound elements that are happening now.
Miles: To achieve the level of efficiency for those tight turnarounds and when they are in higher resolution or more complex things, you need people onsite. But the more casual editing, like if you’re just clipping a shot for a highlight, that has been [done at] home for a decade.
I wanted to ask about OBS Content+, which is where OBS makes everything available: the live events, highlights, features, B roll, and more. How does your team use that, and what does it mean for your content-creation efforts?
Jefferson: It does give us raw material for interviews, profiles, their aerial shots, course, animations, all those types of things. But, at the end of the day, the deeper dive on U.S. athletes still ultimately comes from us, following athletes all the way back to their hometowns and that type of thing. We have a zeroed-in approach on the U.S. team.
Content+ helps when we do deeper dives on non-American athletes. Normally, we send crews all over the world, but this has been an odd year for travel and for capturing people in their homes or training facilities. Leaning on OBS for that type of thing comes at a really good time.
Miles: And it gives us content that we didn’t plan for. We plan for the U.S. athletes, but, if there’s the breakout star from Argentina or somewhere else, we can find the deeper dive there.
I am assuming it is also great because it means you can’t miss anything.
Jefferson: Sometimes, you have mechanical failure on something, and you realize that is the only place something is rolling, but now there are many other opportunities.
Miles: It’s a delicate balance, too. For example, we have splits at the venues and tons of our own cameras at each venue, and it’s not really feasible for us to save every frame of video from every camera for the duration. So we do go through our own process of melting things down to the best shots. Sometimes, you do miss something, but, generally, we do an incredible job of capturing not only what OBS is doing but our cameras as well.
Jefferson: Now it’s less about missing coverage of an event but more likely an iso shot that zeros in on a footfall or a nudge on track. Our team is looking for a specific angle, and maybe, in that moment, we have 50 other angles but not the one they’re looking for. There is a bit more nuance now as to what people are looking for.
We discussed HDR, but you are also working in 5.1.4 and Dolby Atmos, which has four channels of audio above the listener. How are you handling that for editors?
Jefferson: If you’re doing a tight turnaround, you want to carry those height mikes as an additional element on the timeline, and that has been an interesting process. What does the submix look like before it’s delivered for final mixing and Atmos and coding on the backend? That has been an additional kind of learning experience for all of us, figuring out how to mix in the environment we have. [It] is sometimes what we call a three-channel mix, which is kind of stereo plus announce discretely, or this new thing, which is all that plus four more channels. Having a mixed audio environment is odd, but it has been interesting teaching folks about it.
Regarding HDR, what are your tips for others who will grapple with editing HDR, SDR, 1080p, etc.?
Miles: First, we put the content in the format that they need. If our editor back in Stamford is cutting an HDR piece, even though most of the facility is SDR, we try to put everything feeding that edit room into the format they need. We want to minimize what they have to do, so we give them the right recordings. We make sure that their graphics and archive bits and pieces are preconverted. We want them to just drop something on the timeline and use it.
Jefferson: We also set it up so the router heads will hesitate before giving you something that is not right. If you’re in a 1080p control room, the router will hesitate before giving you something that is not available in 1080p. So there are guardrails in place.
One of the big observations is that 1080p HDR makes the product, even if it is SDR, look better. When you capture that wide color gamut and you’re mindful of it from capture to record, you’re capturing more information from the beginning, and it makes the whole process better. But you do have to be mindful and have more checkpoints to verify you’re seeing what was intended.
We also give editors the ability to apply a “predictive” downconvert so they can look at the HDR image in one monitor, apply the LUT that we’re going to apply downstream, and see what the same video will look like in SDR. That gives them an idea of how they are impacting the picture that most people will see.
For both you and vendors, a lot of this is still new. What is on your wish list from vendors and manufacturers?
Jefferson: Flags, flags, flags. We’re moving faster than the manufacturers, but we need flags that communicate what type of content something is, and many of the manufacturers’ flags are slightly off.
You mean things like, is this a 1080p or HDR file or not?
Miles: Yes. We can’t just trust the metadata in a file to steer us in the right direction because often it is wrong. And I think that’s because the default programming in all the systems is 709 color space. It doesn’t even recognize HDR, and some of the devices don’t even know they’re HDR-capable. In graphics, for example, the device may not have a spec to do HDR, but you can feed it a 10-bit file, and it works and looks great. But none of the tools wrap around it, and none of the flags are accurate.
Jefferson: And it also goes for baseband devices, where you plug a BNC into something and you may get an incorrect auto sense for inbound signals. Each of the vendors are racing towards solutions that are needed and that type of minutiae they are getting to, but we look forward to the day when it’s all correct.
Last thing, I know you’ve been working on AI captioning with EEG. Can you tell me more about that?
Miles: When we looked at the scale of the Summer Games, it wasn’t even remotely feasible to do all the captioning we needed exclusively with humans. All of the “signature” NBC calls like you see on the network and cable dayparts are still done by traditional human captioners. But everything else, like when we do every single sport live, all those feeds are now done by a speech-to-text–driven system. And we’ve spent two long years training the specific sport models for captioning using data from OBS, like the athlete rosters and specific keywords, and each captioner has a sports-specific model. I’ve been pleasantly surprised at how accurate it has been.
On the accessibility front, we are also delivering captions for our VOD assets, which is great to see: we can share the Games with that many more people. Between the humans and AI innovation, we’re able to do a quick turnaround and get all those captions delivered in real time.
Jefferson: Because we were delayed by a year, Jim and the team were able to train the AI for an additional year. It would not have worked as well without the extra year. But it has been smooth, and it has been chugging along. I think we’re up to more than 3,000 hours of live content coming through that system.
And EEG has been a great partner to figure it out. I feel like we’ve gotten much closer than we thought we could.