Salsa Sound Eyes U.S. Market for Automated Audio Mixing

College football is an early target sector

Salsa Sound gained attention when its vCrowd artificial-crowd-sound solution became the go-to for Manchester City FC games and other sports after the COVID pandemic drove actual crowds from the soccer stands. Before that, though, the company had already started down the path of AI-based audio mixing.

In 2019, the UK company launched an innovative automated system that creates live audio mixes for soccer matches. Based on Salsa Sound’s patented AI technology, the system takes audio from conventional broadcast microphones at a venue and algorithmically creates a real-time mix. It automatically mutes profanity from pitch-side mics and can be used onsite or through remote production, over IP or in the cloud.

MIXaiR, as the AI engine is dubbed, has been trained on “hours and hours” of content from Premier League and Championship Games, with the mixes broadcast over Man City’s digital channel, says Salsa Sound Co-Founder/Director Rob Oldfield.

Salsa Sound’s Rob Oldfield: “We have all the infrastructure we want to move into American football; all we need is some content.”

“[AI is] kind of the bedrock of our company,” he explains, “and all came out of a university research project into interactive broadcasting. To do this, you need to have an object-based paradigm for the audio, so we developed AI algorithms that would listen to the microphones and, when something interesting was detected, would automatically chop that up as an audio object and triangulate its location so you could have this object-based audio representation. In doing this, we realized it was an opportunity not just for future broadcast but also to automate current mixing processes.”

It’s a way of consistently keeping the most interesting field effects of a game front and center, he says. That has been done manually for decades by opening and closing individual microphone channels as the ball moves between them, he adds, but it’s suited to an automated approach, given the size of a soccer pitch and the constant noise of crowds and especially as broadcast sports moves closer to immersive mixes. The trick is a proprietary process — a “really clever way,” he says — of doing a crossfade over different frequency bands and using power (that is, gain) matching, so that fades from one microphone to the other are seamless transitions.

Next Steps

The next step for MIXaiR in the UK is a project dubbed 5G Edge-XR, a 5G-network-based pilot backed by such entities as British Telecom; the government’s Ministry for Digital, Media, Culture, and Sport; and the University of Bristol. The project is intended to enable people to immersively view sports events from all angles across a broad range of devices: smartphones, tablets, AR and VR headsets, as well as televisions. MIXaiR is to be applied first to boxing matches within the Edge-XR envelope, something Oldfield says is scheduled to take place by January.

Beyond that, Salsa is also looking at the U.S. broadcast-sports market, initially at collegiate football — particularly from the perspective of audio submixes: specifically, automating the parabolic microphones along the sidelines.

“Producing that parabolic submix is a labor-intensive, difficult task,” he notes. “We can automate that. AI provides a much more robust and much more intelligent way of doing automated mixing — automatically recognizing what the significant audio moments are in the game — because, with parabolic microphones, it needs to have some intelligence behind it.”

What’s standing between MIXaiR and its U.S. application is content: authentic audio from actual games that can be used to train the AI algorithms. Specifically, he says, individual audio stems rather than fully mixed sound tracks are needed to enable the AI to analyze discrete sonic elements. For that, Salsa Sound has approached NCAA football, which he says is a huge market that could benefit from the cost-effectiveness of automated mixing.

“We have all the infrastructure we want to move into American football; all we need is some content,” he says. “That applies to any sport. We’d just be creating a new AI module for it.”

So far, no automated approaches to broadcast audio have made any swatches of its human workforce redundant, but it’s a natural concern for pretty much any worker these days, especially in budget-focused broadcast.

Oldfield says they shouldn’t worry: “We’re not making things that are trying to put people out of business; rather, we’re trying to create tools that will help people work better. There’s always a need for more and more content. Everybody’s trying to do more but on the same budget and the same personnel roster that they’ve always had.   This is about enabling people to create more content of a higher quality and without incurring additional costs.”

And that might be music to a U.S. broadcaster’s ears.

 

Password must contain the following:

A lowercase letter

A capital (uppercase) letter

A number

Minimum 8 characters