The AI Factor: How Artificial Intelligence Could Change the Asset-Management Game
A look at efforts by Microsoft, IBM, PGA TOUR Entertainment, NASCAR Productions
Artificial intelligence and deep learning are gaining momentum in many areas of business, but does AI have a future in asset management? Is it possible that next-gen software will transform and fully automate logging and asset-tagging? At SVG’s 11th-annual Sports Content Management & Storage Forum, representatives from Microsoft, IBM, PGA TOUR Entertainment, and NASCAR Productions took the stage to address the use of AI to dramatically streamline the media-asset–management (MAM) ecosystem.
“Imagine you could do things like identify faces, people, objects, and speech-to-text in your video clips all automatically. Then, imagine you could search all of that. Imagine AI could do a summation and automatically [generate] a highlight reel to get the best five minutes from an hour-long clip,” said Scott Bounds, media industry lead, Microsoft. “Those are the things that we are actually looking to do with our Cognitive Services. Think about vision, speech, language. Imagine you have a sporting event that you could automatically translate to a different language on the fly. Those are the things that we are all thinking about and building the platforms for.”
An AI Proof-of-Concept for Masters Highlights
IBM’s Watson cognitive system has been leading the AI charge across a variety of industries, including sports. Sports-media managers are in search of AI tools that can automatically understand visual content to help curate and search large video collections. As a result, IBM Research is creating Watson-based core technologies to uncover insights from this vast amount of video data.
At The Masters golf tournament in April, IBM created a first-ever multimodal system for summarizing golf video. Video of every shot by 90-plus golfers over four rounds from multiple camera angles can quickly add up to thousands of hours of footage. IBM created a proof-of-concept system for auto-curation of individual shot highlights from the tournament’s live video streams to accelerate creation of highlights packages.
The system extracted exciting moments from live Masters broadcast feeds based on multimodal (video, audio, and text) AI techniques. It was trained to “watch” and “hear” broadcast videos in real time, accurately identifying the start and end frames of key event-highlights–based elements: crowd cheering, action recognition (high fives or fist pumps); commentator excitement (tone of voice), commentary (exciting words or expressions obtained from the Watson Speech to Text API), shot-boundary detection, TV graphics (such as lower thirds), and optical character recognition (the ability to extract text from images to determine which player is in which shot).
“We essentially had Watson learn how to play golf,” explained Janet Snowdon, global director, media and entertainment industry solutions, IBM. “We took all the broadcast feeds … and came up with this idea called the ‘excitement factor.’ Watson analyzed the broadcast feed of each shot, commentator excitement level, the action excitement level, the crowd cheering level, and so on, and then came up with a score.”
The selected segments were added to an interactive dashboard for quick review and retrieval by a video editor or broadcast producer. In addition, by leveraging TV graphics and optical character recognition, the system automatically gathered player/hole metadata, which was matched with the relevant highlight segments to enable personalized highlights based on a viewer’s favorite players.
“We used Watson to create what I call ‘magical metadata’: understanding the sentiment of the video clip. It automatically generated rough-cut clips,” said Snowden. “You can imagine as a broadcaster or as Augusta, how you can create a unique customer experience for that golf fan.”
The Search for Speech-to-Text AI Applications
PGA TOUR Entertainment has been exploring AI for four years, and its efforts have come a long way.
“About 3½ years ago, speech-to-text was one of the biggest things we wanted to do,” recounted Michael Raimondo, director, media asset management, PGA TOUR Entertainment. “We put in a post-round [interview with a] golfer from the UK with a heavy accent, and, the first time we looked at it afterwards, [the text] actually said ‘fruit salad’ in the middle of it. So we knew it wasn’t ready for primetime yet. It’s obviously progressed a lot since then.”
In early 2016, the company began officially working with AI for speech-to-text, as well as logo and facial recognition. The team extracted the player, hole, and shot info from the constant graphic in the PGA TOUR Live streaming feed and entered that metadata into its internal system.
Since golf coverage doesn’t show every single shot on every single hole, logging is especially difficult. There is no all-encompassing time stamp (such as a game clock in hockey or basketball) to marry the play data with. As a result, PGA TOUR Entertainment is exploring AI-based tools alleviate the extensive media logging required to do so.
In addition, the company is exploring automated highlights creation (similar to IBM’s tool at The Masters) to produce player-specific packages focused on a player born in a particular country for its international partners.
“It’s an evolution that we’ve been constantly working on for quite a while,” said Raimondo.
Removing the Human Factor in Logging and Tagging
As leagues look to leverage their vast video archives to create new revenue streams, AI has become a key tool in efforts to properly digitize and log this content. NASCAR Productions, for example, owns one of the largest sports archives in the world, with 500,000 hours of content and 3 million assets. However, that content has only 9.5 million metadata tags – far short of what’s required to efficiently search, find, and monetize the assets effectively. As a result, NASCAR is actively ramping up its AI efforts in hopes that it will improve on the time-consuming and inefficient human-powered tagging process.
“The reason we are looking at [AI] is that humans are highly inefficient,” said Chris Witmayer, director, broadcast, production and new media technology, NASCAR Productions. “We have found that humans are 4-to-1 on the efficiency scale. For every hour of footage, it takes a human about four hours to enter metadata. We need to find a way to do this because, although we have an entire archive that goes back to the 1930s, we can’t actually find anything efficiently. If you can’t find anything, you can’t sell it, and you can’t make money. So this is big for us.”