ITU Recommendation Targets File Compatibility in Immersive-Audio Formats
We lived in analog world for the better part of a century; we just didn’t know it. At least, not until digital technology came along and compelled us to find an adjective to describe the environment we had inhabited prior to zeros and ones. Similarly, the broadcast world we live in now will likely be referred to as “channel-based,” thanks to the imminent arrival of object- and scene-based audio, as represented by potential formats, such as Dolby’s AC-4 and the MPEG-H Audio Alliance’s eponymous format.
The question will now become, will these formats be able to talk to each other, and will the broadcast infrastructure from field production to the plant have a fully compatible workflow? The answer lies in ITU-R BS.2076-0, an international recommendation that outlines a standardized model that specifies how XML metadata can be generated to provide the definitions of tracks within an audio file and is targeted at professional program interchange. This Audio Definition Model (ADM), as it’s called, supports descriptions for channels, objects, HOA (Higher Order Ambisonics), binaural, and bitrate-compressed audio elements.
“This recommendation, along with its companion BWAV/RF64 recommendation, will mark a great step forward for enabling professional, file-based, and open interchange of next-generation immersive and personalized programming,” says Jeffrey Riedmiller, senior director, Sound Group/Office of the CTO, Dolby Labs. Dolby, working with ITU working group co-chair BBC and other experts, has been at the leading edge of the ADM project for 18 months.
The ITU’s recommendation on the model, recondite and dense at more than 17,000 words, seems to acknowledge its own complexity, offering a whimsical analogy and comparing the recommendation to a baking recipe: ITU-R BS.2076-0 provides the ingredient list — the broadcast version of eggs, flour, butter, and sugar — that will produce a proper cake. “The ADM,” it says, “provides the instructions for combining ingredients but does not tell you how to do the mixing or baking; in the audio world, that is what the renderer does.”
Beyond the virtual kitchen, the recommendation describes the model as “divided into two sections, the content part and the format part. The content part describes what is contained in the audio, so will describe things like the language of any dialogue, the loudness and so on. The format part describes the technical nature of the audio so it can be decoded or rendered correctly. Some of the format elements may be defined before we have any audio signals, whereas the content parts can usually only be completed after the signals have been generated.”
More practically, Riedmiller explains, “the recommendation will become the foundation for how immersive and personalized audio will be interchanged between the creatives who make it and the broadcasters who distribute it. It aims to foster open interchange for these new types of programs and their building blocks — including dynamic audio objects, traditional channels, or scene-based components — and how they get experienced [by consumers].”
The ITU-R ADM is based on the EBU Tech 3364 Audio Definition Model. According to Riedmiller, the ITU iteration represents a worldwide collaboration among industry experts that will enable manufacturers to begin updating software and hardware to enable the next-generation workflows that broadcasters will use on a global basis.
The industry is, in a very real sense, watching the construction of the infrastructure for what may become the new surround-audio norm for broadcast sound. “And,” he adds, “no one’s really noticed it yet.”