IBM’s AI Technology Auto-Curates Golf Highlights at The Masters
The tremendous growth of video data in sports has created a significant demand for artificial intelligence (AI) tools that can automatically understand visual content to facilitate effective curation and searching of large video collections. IBM Research is creating core technologies using advanced AI techniques that power solutions that uncover insights from this vast amount of video data. This year at the 2017 Masters Golf Tournament, IBM has created a first ever multi-modal system for summarizing golf video.
Capturing all the action at the Masters and sharing it with fans in a timely fashion has generally required labor-intensive effort. With 90 golfers playing multiple rounds over four days, video from every tee, every hole and multiple camera angles can quickly add up to thousands of hours of footage.
IBM Research worked with the IBM iX design team to create a proof-of-concept system for auto-curation of individual shot highlights from the tournament’s live video streams, with the goal of simplifying and accelerating the video production process to create golf play highlight packages.
The system extracts exciting moments from live video streams of the Masters tournament based on multimodal (video, audio, and text) AI techniques. More specifically, this system is trained to “watch” and “hear” broadcast videos in real-time, accurately identifying the start and end frames of key event highlights based on the following markers:
The first multi-modal system for summarizing golf video for the 2017 Masters Golf Tournament, created by IBM Research using AI techniques.
- Crowd cheering
- Action recognition, such as high fives or fist pumps
- Commentator excitement (tone of voice)
- Commentary (exciting words or expressions obtained from the Watson Speech to Text API)
- Shot-boundary detection
- TV graphics (such as lower third banners)
- Optical character recognition (the ability to extract text from images to determine which player is in which shot)
The selected segments are then added to an interactive dashboard for quick review and retrieval by a video editor or broadcast producer, speeding up the process by which these highlights can then be shared with fans eager to see the latest action. In addition, by leveraging TV graphics and optical character recognition, our system automatically gathers information about the player name and hole number. This metadata is matched with the relevant highlight segments, which could be used to enable searches like “show me all highlights of player X during the tournament” or to build personalized highlights based on a viewer’s favorite players.
The solution created for the Master’s is an extension of our team’s recent work creating the first Cognitive Movie Trailer. The technology extends state-of-the-art deep learning models, and provides effective methods for learning new classifiers using a few manually-annotated training examples via self-supervised and active learning techniques.
To IBM’s knowledge, this multimodal, highlight-ranking system is the first of its kind to be deployed for a sporting event. Integrating the multiple audio and visual components (such as the detection of a player celebrating or measures of the commentator and crowd excitement levels) using AI methods for ranking golf’s exciting moments was a challenge, but offers new opportunities for applications both in, and out, of the sports industry, according to IBM.