Salsa Sound Looks To Make Next-Gen Audio Easier To Manage

The hardware/software solution extracts individual sounds to create a stream of discrete objects

Next-generation audio (NGA) — Atmos, MPEG-H, Auro, and various broadcast iterations of the incipient 5.1+4 configuration — will not have nearly the history that helped chart the future of surround audio. By the time broadcast sound had encompassed surround, it had more than two decades of development in cinema and even home-theater applications. NGA will have far less of that sort of experience to build on in developing its own workflow standards and protocols.

Salsa co-founder Dr. Rob Oldfield: “The object-based paradigm requires new approaches to audio production.”

Historically, much of such work has come from both academic and commercial interests. More recently, this has produced a hybrid in the form of UK-based Salsa Sound. The origin of the Salsa solution (the name was originally an acronym for Spatially Automated Live Sports Audio; the current product name is SF1) was a hardware and software solution developed and patented at the University of Salford in Manchester to deliver “a dramatic improvement in the quality of sound for live sports broadcasts.” A year later, Salsa Sound is a private company created to commercialize the theoretical and proof-of-concept work accomplished within the university’s research structure.

A technical paper presented by Salsa co-founder Dr. Rob Oldfield at the IBC 2017 conference last month laid out the basics of NGA for live broadcast as well as what he perceives as the limits of the current audio-production paradigm.

“Traditionally, broadcast has been done in a channel-based, one-size-fits-all way, which offers very little in terms of interaction or personalization,” Oldfield says. “It requires a new production output for every target reproduction system, meaning that the more formats that come out — stereo, 5.1, 7.1, and so on — the more different mixes that have to be done by the sound engineer, so it’s not a scalable solution.

“Next-generation audio/object-based audio, on the other hand, is a completely different paradigm,” he continues. “The whole audio scene is captured and all of the individual components — objects — are retained all the way through the broadcast chain, and then the final mix is actually done at the viewer end.”

However, Oldfield maintains, the lack of automated tools for managing audio objects is slowing NGA’s development and its uptake.

“What we are doing is specifically addressing an ongoing —and, until now, unsolved — challenge faced by object-based formats, such as Atmos, MPEG-H, and DTS-X,” he explains. “The object-based paradigm requires new approaches to audio production, and, as yet, there are no automated tools to extract objects either in terms of the content or the location.”

Asked about Lawo’s ball/player-tracking KICK system, Oldfield says it’s on the right track but is aimed more at automated mixing rather than at audio-object extraction. “The Lawo system automates the manual method of mixing rather than addressing the new object-based paradigm.”

The Salsa system seeks to go further, he points out: rather than just creating an automated mix and using this as an object, it extracts the individual sounds to create a stream of discrete objects, which can be manipulated and processed individually. “The main challenge that we address is, by using triangulation techniques, separating out the key audio events — objects — and localizing them in space so we can extract the audio content and also the accompanying metadata, which provides meaning to the content and informs the rendering.

“Key to what we do,” he continues, “is an algorithm which ‘listens’ to the audio and compares the incoming signal with a set of predefined acoustic-feature templates, such as the sound of a ball-kick or whistle-blow, and, when a match is found, it makes a production choice, either adding the microphone signal into the mix or enhancing it with some preproduced content, for instance. Salsa also accurately identifies in real time the location and timing of on-field sounds that can help assist semantic analysis of the match or highlight creation and match analysis, removing the need for manual event logging. [SF1] is currently optimized for [soccer] but equally extends to other sports where there are specific on-field sounds that need to be captured and separated out as objects.”

The company planned to conduct live field tests of the Salsa platform in conjunction with a major UK sports broadcaster (he declined to divulge which one) at a Premier League match at the end of September. Oldfield is enthusiastic about NGA’s potential for broadcast but understands that it’s creating its own narrative as it moves along.

“Object-based audio opens up new and exciting opportunities,” he says, “but, along with this, it also poses some challenges to achieve practical, efficient production workflow especially for live sports.”

Subscribe and Get SVG Newsletters

  • SVG Insider (Tuesday - Friday)
  • SVG Digital Now (Monday)
  • SportSound (Monthly)
  • College (Monthly)
  • Venue Production (Monthly)