NAB 2017

NAB Reflections: IBM Cloud Video’s David Clevinger on How Watson AI Can Unleash the Power of Video Libraries

Cloud-based service analyzes content and generates ‘conceptual’ metadata in near real time

IBM made waves at NAB 2017 last month by announcing a Watson artificial-intelligence–enabled cloud service that could help content creators unlock insights from their video libraries. Dubbed the IBM Content Enrichment Service, the platform combines Watson AI with the IBM Cloud to create a tool that can analyze a client’s video-content library to generate advanced real-time metadata, which can make video more relevant for consumers and streamline workflows for media managers.

Following the show, SVG sat down with David Clevinger, senior director, product and strategy, IBM Cloud Video, to discuss how the new service can benefit sports-content creators and how he sees the power of Watson AI changing the game for sports production.

How can IBM Content Enrichment Service benefit sports-content creators?
With the IBM Content Enrichment Service, we’re able to analyze a client’s video content, even large back catalogs, to generate metadata. This metadata isn’t just based on natural-language processing, where the service is listening and turning speech into text; it’s also creating conceptual metadata, such as noting an event that’s deemed to have a high level of excitement, like a great putt or a slam dunk, and temporal metadata, such as when in the video a piece of metadata is relevant vs. when it isn’t. Generating this metadata is useful in at least two ways: making video more relevant on the consumption end, for example, and making ad units more salient, [as well as] streamlining workflows for media managers. Your team needs to be able to find content more quickly, and this level of metadata is crucial.

David Clevinger: “Sports content is inherently fast-paced and throws off a lot of data, and media managers and producer/editors need every advantage they can get to keep up.”

The idea is that, by combining artificial intelligence with the IBM Cloud, companies will be able to make sense of unstructured data and extract new insights from video with a level of analysis not previously possible.

You can use this kind of service to meet a number of use cases, but it is particularly valuable for sports, since insights are provided in real time, which is essential for live events. Content Enrichment enables a sports network to more quickly identify and package specific content that meets specific metadata parameters, such as a player or team name, plus content that contains happy or exciting scenes based on language, sentiment, and images; then work with advertisers to promote clips of those moments to fans prior to, say, the playoffs. Previously, someone would have had to manually go through every piece of video to identify each piece of content and break it into scenes. Now each scene can be more quickly identified to attract viewers and advertisers for quick-turn campaigns.

Can you detail how you’re using Watson to allow sports-video-content creators to more easily find assets and create better content?
The Content Enrichment Service is just one example. We’re generating metadata, including structured metadata, perhaps where none or only partial metadata existed previously. Because it’s a service and modular, we can work with sports-content owners to generate that data and bring it into their workflow, so media managers can get access to it in near real time, even as assets are still in a production state, potentially.

For sports-content creators specifically, IBM offers a wide array of solutions to help them unlock value from video by bringing together Watson and the cloud. Beyond our Content Enrichment service, IBM is leveraging data and analytics, powered by Watson, to help sports-content creators optimize video.

You’ve seen our closed-captioning capabilities. IBM worked with the US Open [tennis tournament] to provide intelligent-captioning technology powered by Watson. Watson was able to learn the difference, for example, between romantic love and love as a tennis score, as well as the rest of the vernacular around tennis in general, resulting in a more highly relevant, instant transcript of the event.

I think one of the most innovative things we did over the past year was to work with IBM Research to create the first-ever multimodal system for summarizing golf video and finding highlights during this year’s Masters tournament. This was a proof-of-concept system we trained to “watch” and “hear” broadcast videos in real time, then accurately identify the start and end frames of key event highlights using commentator tone, player celebrations, high fives, and other indicators. These scenes were semantically identified and categorized to allow both editors and users to find them in near real time. Watson scored each clip based on how exciting it thought each moment was — using, for example, the roar of the crowd.

In general, how do you believe AI tools can enhance video-production operations, especially for sports?
There’s a lot of room for AI to enhance production operations. The first benefit is improved workflows. Sports content is inherently fast-paced and throws off a lot of data, and media managers and producer/editors need every advantage they can get to keep up. Integrating the Content Enrichment service into your production workflow could do a couple of things to make production faster: generate metadata so the editor doesn’t have to key in or find every relevant keyword or taxonomy node, so that person can generate more clips in a given timeframe. It also can help them find relevant content more quickly. If an athlete makes a great play, the editor can instantly find all the other moments that match that play and, purely for example, auto-populate a playlist. Basically, smarter is faster.

A key component of this benefit is entity recognition, which is a broad term, but, for sports, think about using entity recognition to identify jersey numbers or sponsor logos in a sports video [to identify] types of sports, teams, and even players. We can use this recognition to power internal workflow as well as external products, like recommendation engines or fantasy-league clips, where an athlete’s clips are automatically added to a profile page.

Speaking of entity recognition, there is a huge monetization opportunity for advertisers interested in live-streamed sports, and this body of work will inform sports business models in the future. AI-powered innovations in object recognition offer a solution to the weighty challenge of digital ad measurement. At sporting events, logos are plastered on every surface. Object recognition will allow content owners to record the length of time a brand’s logo appeared within the video frame or even what percentage of the frame was taken up by the image. These insights can create an entirely new and quantitative way to buy and sell sponsorships, opening up additional revenue streams.

Now that the transition from UStream and the formation of the IBM Cloud Video unit is complete, how will the platform look to move forward?
You’re correct that the formation of the unit is complete, and we don’t plan to stop growing. Since the formation of the Cloud Video unit, we’ve supported media and entertainment companies, as well as enterprises, with reliable on-demand and live-streamed–video services. Generally, the future of IBM Cloud Video is the integration of these Watson-powered cognitive solutions we’ve discussed, continuing to add to those services in unique ways and doing so in a modular fashion to meet the needs of any given client. We’re absolutely committed to AI-infused products as a core way to help M&E and sports clients unlock more value from their video.

The Content Enrichment service that we announced at the NAB Show is a great example of that commitment and really just the next step in a longer journey. We’ll bring our platform further into the future by empowering media companies to make even better decisions about content creation, content acquisition, content workflows, asset logistics, and personalization — all based on richer metadata.