SVG Sit-Down: Google Cloud’s Anshul Kapoor on the Future of ‘Generative Production’ in Live Sports

As the the live-sports industry heads into 2026, AI will only become more deeply embedded in production and distribution. In live sports, production leaders know that most meaningful moments aren’t defined solely by the final outcome, but by the build-up, emotion, and intensity that surround it. Google Cloud is at the forefront of this shift, using AI to analyze crowd energy, player behavior, and subtle on-field cues to identify pivotal moments as they unfold. 

Anshul Kapoor, Head of Media Solutions, Google Cloud

As the new year approaches, SVG sat down with Anshul Kapoor, Head of Media Solutions, Google Cloud to explore how context-aware AI is redefining highlights and transforming the live viewing experience by examining sports the way fans do – through momentum, anticipation, and reaction, not just box scores. He believes this context-aware viewing will not only make the live experience far more immersive for viewers, but also enable a new era of “Generative Production” in live sports.

Can you give us one big prediction for how AI will impact coverage of live sports in 2026?
For live television, and especially sporting events, highlights are about the verbs, not just the nouns. In live sports, AI is moving past simply logging a goal. It will read the crowd noise, player tension, and micro-reactions to pinpoint the exact moment the play truly became significant — sometimes 30 seconds before the score. This context-aware viewing will make the live experience far more immersive.

The real transformation in sports media will be the shift from simply “logging” a game to truly “understanding” it using AI. While current game analysis focuses on simple metadata — the “nouns” like “goal” or “touchdown” — context-aware AI uses advanced multimodal models (analyzing video, audio, and text) to capture the story of the game, including the “verbs and adjectives” like the tension before a play or a player’s defensive intensity

And how does this relate to the potential for generative AI in sports production?
In the next 12–18 months, this capability will enable Generative Production — moving away from just using AI to record game history and actually using it to create brand-new content and inventory, like tons of custom shorts, trailers, and hyper-personalized fan clips. 

This transforms the traditional broadcast from a single feed into an infinite generator of specific, high-value content, shifting the industry’s focus from merely saving money on basic clipping to generating entirely new revenue streams.

How do these AI tools distinguish between authentic emotional cues — like rising tension or crowd anticipation — and noise or false positives?
The key to distinguishing authentic emotional cues from mere noise lies in multimodality and triangulation. If an AI only looks at a single signal, like audio levels, “loud is loud,” a false positive is highly likely (for example, the AI flags a moment as a highlight based solely on a noise signal, but the noise is irrelevant to the game). 

Our AI’s “superpower” is its ability to simultaneously analyze multiple data streams: crossing what it sees (computer vision of player activity), what it hears (the specific frequencies of a nervous versus a rowdy crowd), and what it reads (player telemetry and the game clock). This triangulation ensures high confidence. For example, a spike in volume without a corresponding spike in player activity is dismissed as simple noise. Only when multiple data points corroborate a context — such as the crowd getting loud while players are showing high defensive intensity — can the AI confirm and generate a context-rich moment that was previously impossible for human loggers to capture consistently.

For those looking to experiment with using these AI tools, what does the workflow look like for integrating this context-aware analysis into a live broadcast (without adding latency if possible)?
Latency is the enemy in live broadcast, so we are not suggesting you rip out your existing infrastructure. To integrate this context-aware analysis without adding latency, the AI operates as an “Intelligent Sidecar,” a “set and forget” system that works in parallel with your existing setup. One part of your video goes to the traditional switcher, and the other goes to the cloud to generate intelligence. We do not foresee AI taking over the video switcher, as this would introduce unacceptable delays and remove essential human creative input. Instead, the AI’s role is to act as the ultimate production assistant, constantly feeding the production team better options than a human logger. 

AI agents will soon become the best “chief of staff” a director could have, automatically creating metadata, graphics, and commentary options and sending those streamlined assets back to the human team for on-air use.

How customizable is this technology for different sports, especially when crowd noise, player reactions, and game pacing can vary dramatically?
Customization is critical because you cannot use a generic AI model for different sports; a soccer model is ineffective for golf, given the dramatic variations in rules, terminology, and crowd dynamics — the unique “dialect” of each sport. To ensure accuracy, we use agentic AI, allowing customers to build bespoke, sport-specific agents — such as a “tennis agent” that is fine-tuned to understand tennis rules and crowd dynamics. 

While the underlying foundation models are the same, this agent-based approach means the system can proactively work on behalf of the production team to complete complex, sport-specific goals. This high level of specialization not only ensures accurate context-aware analysis, but also paves the way for automating complex, multi-step production processes. For instance, in a basketball game, an agent must first detect the action on the screen, look up the relevant context, retrieve the specific statistic or video, and then generate a response. This process will fundamentally improve the economics of content creation.

How soon do you see this deeper contextual understanding being standard (or do you at all?) across major live sports broadcasts, not just experimental deployments?
We are already past the experiment phase with early adopters, and we expect this deeper contextual understanding to become standard very soon. The primary driver is audience demand; fans are now accustomed to hyper-personalized feeds on their second screens. Heading into the major global sporting events of the 2026-2027 cycle, this capability will become table stakes, moving from an “experimental” feature to an “essential” one. This technology is no longer a luxury; it is the only way for rights holders to maximize their content mileage and maintain viewer attention in a fragmented market. In short, viewer loyalty will shift toward the enhanced, personalized experience that only AI can deliver, making it a mandatory component of major live sports broadcasts.

From a workflow standpoint, are their specific signals or data streams that are most critical for capturing these contextual moments in real time?
When it comes to capturing those critical, real-time contextual moments, it all boils down to fusion. No single data stream is enough. The essential signals are the combination of video (specifically computer vision and player tracking data), audio (like the nuances of crowd noise and commentary), and telemetry (meaning the official stats and the game clock). The hardest technical challenge is synchronization, not  just collecting the data. If the audio cheer arrives 200 milliseconds before the video goal, the model fails to connect the event to the emotion. Our edge is the ability to perfectly align these massive, disparate streams in real-time. This three-part combination is what allows the AI to move past simply logging the basic ‘nouns’ of the game and start truly “understanding” the actual context, the emotional ‘verbs,’ and the descriptive ‘adjectives’ of the moment.

What is the most significant factor that the sports-media market is underestimating when it comes to the power of AI?
The sports-media market needs to start thinking much bigger about monetization. The biggest trend we’re seeing is Generative Production, as I mentioned earlier. Now, here’s the interesting part: the more content we pump out this way (ie: custom shorts, trailers, and hyper-personalized fan clips), the more critical it becomes to have AI-powered hyper-segmentation. It’s the engine that allows us to create infinite “niche” feeds (for example, a “Fantasy Feed” or a “Tactical Feed”) for every segment of the audience.

This capability to synthesize massive amounts of siloed data is fundamentally changing the competitive landscape, letting studios spot content trends and make business decisions in months, not years. Ultimately, as the market gets saturated, the real commodity will be trust, which is only earned by consistently delivering a high-quality, personalized fan experience.

This interview has been edited for length and clarity. 

Password must contain the following:

A lowercase letter

A capital (uppercase) letter

A number

Minimum 8 characters