IBC 2017

IBC Reflections: Dell EMC’s Tom Burns on End-to-End Solutions, the Importance of ‘Data Hygiene’

For sports broadcasters, archive and near-line/online storage is an increasingly important issue

IBC 2017 marked the first for the Dell EMC brand. Although Dell’s $67 billion deal to acquire EMC had closed just days before IBC 2016, this year’s show in Amsterdam was the first where the newly formed Dell EMC was leveraging both entities’ assets to create an end-to-end workflow and infrastructure solution. In addition, Dell EMC’s booth featured the new Gen 6 Isilon scale-out NAS and ECS object storage offerings, along with other new and upgraded technologies.

During the show, SVG sat down with Tom Burns, CTO, M&E, Dell EMC, to discuss the synergy and new products on display, the importance for M&E organizations to maintain “storage hygiene,” the blurring line between deep archive and near-line/online storage, the challenges posed as the worlds of IT and broadcast unite, and the promise of artificial intelligence for storage and MAM.

CTO, M&E, Tom Burns at Dell EMC’s booth at IBC 2017

What is Dell EMC highlighting at IBC 2017?
We’ve got the all-in-one Precision workstation, which is super cool for content creation. And we’ve got converged infrastructure of the VxRail from the legacy EMC side. This is the first show that we’ve done as a truly Dell EMC.

In terms of media and entertainment, we’re able to offer several end-to-end solutions. From workstations to color-calibrator displays to servers to GPU-based render farms for some of the new GPU rendering software, like Redshift. The VFX and gaming communities are huge with Dell because they have the Alienware PCs and the HTC Vive VR headsets, which opens up a lot of new opportunities for us. We’re able to look at complete solutions, not just individual pieces of the puzzle.

You’ve often preached about the need for ‘storage hygiene’ in the M&E space. What do you mean by that, and why are you passionate about it?
I come from the old days of videotape, when directors yelled “Cut and print” for a reason: because it cost 12 cents a foot to process and, if it was a bad take, you wouldn’t even bother processing it. Now people just let the camera roll. We used to generate maybe 1 TB to 2 TB of new content a night. Now that number is probably 4 TB to 5 TB a night on a normal production, and, if you have high framerate, that number can go to 40 TB a night. That’s an awful lot of data.

People need to have data hygiene because they have to understand that data has mass, and mass has a cost associated with it. Data is costly to store, but it’s even more costly to move; If you just move it arbitrarily, you’re wasting money. That is why Isilon takes the data-lake approach: leave the data in one place and bring the applications to the data, rather than the other way around.

I find high frame rate to be the single biggest driver of storage, and that obviously applies to sports. Everybody talks about larger resolution as a storage driver, but you’re really talking about its increasing only two or four times. But high frame rate is the real driver. That is being driven by gaming: 60 fps is the norm now for gaming in VR, but that is going towards 120 fps. So high frame rates are going to require even more data hygiene.

What tools is Dell EMC developing to confront this data-hygiene issue?
Search for Isilon is how we’re responding to the changing nature of content creation. Instead of forcing everyone to be more rigorous and to implement old-school data hygiene, we’re just putting in Elasticsearch-based indexing. The first petabyte takes 24 hours; you have to crawl the file system to build that initial index. After that, we have the Common Event Enabler, which is a log: basically, any file-system event at all gets registered in the CEE.

Because Search for Isilon is an open-source, Elasticsearch database with the Apache Tika content-aware parser, after the first disk crawl when you build the index, every little file event that goes into the CEE is automatically updated in the Elasticsearch database. That means that, if you mistyped a file and some artist is calling the storage admin looking for it, you can quickly search and find it.

As the need to create and immediately distribute content continues to surge, the line is blurring between sports-content owners’ deep archive and near-line storage. Are you seeing this, and how is Dell EMC helping manage it?
If everybody had unlimited money, we would recommend keeping everything on Isilon. Instead, because people are looking for a replacement to an LTO tape tier, we offer the ECS as our globally distributed, geo-local, scalable object store. That gives us the ability to blur the lines between what is archive and what is near-line and what is online production-based storage – through what used to be called information-lifecycle management or hierarchical storage manager. There are many ways to address this [issue]. In the old days, it used to be what was on the shelf and what was in the robot; those were the two tiers. Now there’s this kind of infinite tier. Where ECS fits in is, it provides that global overlay that is Isilon’s disaster recovery, plus the ability to support collaborative production.

If you’re reading from your tape library as much as you’re writing to your tape library, you’ve got a problem, because that’s an active archive and tape is just not designed for that. With an object store, it doesn’t have to be public cloud; you can bring on-prem object store for roughly 30% of the cost of the public cloud, and then you have an active tier. So, if someone hits a home run, you can instantly grab every single clip of every single home run that player has ever hit. You can do that today with tape, but it takes 24 hours. Now you can have it almost instantaneously so that you can get that highlight package out instantly as opposed to waiting for a Sunday wrap-up show.

How do you see the worlds of broadcast and IT blending, and how are you helping customers navigate this transition?
I mentioned before that we have the data lake, and that eliminates war on the progress bar; we don’t want to sit there and wait for files to copy from silo to silo. The vision that informs that, you could call it data gravity. When you have to calculate the financial model of [implementing] a global pipeline, what it costs to replace a passive archive with an active archive, and what kind of speed-up you’re going to get when you implement search, you’re basically saying the financial cost of having access to that clip is a matrix of how topical the clip is, how difficult it is to retrieve, and how much can I monetize it if I get it quickly.

In video, we were always very structured in our approach because it was a linear workflow. You didn’t have to make these financial calculations; it always took the same amount of time to copy because it was a real-time network. Now we have to move it faster than real time if it’s super important, but, if it’s just moderately near-line important, I can move it slower than real time and not spend as much money to get that asset there.

What’s happening now is, the M&E market is having to adopt the infrastructure and workflows of enterprise IT, [which has already built up those financial models that take into account data gravity for a particular asset. You have to avoid overspeccing the infrastructure for some unimportant asset, while also making sure the really important assets [can be] put into a highlight reel immediately and are available right away. Active archives, search, the combination of file and object: all of these are enterprise-IT technologies that video people are just now starting to explore.

In addition to the technological challenges involved in blending IT and broadcast, how much of this is a cultural challenge?
There’s definitely a culture war there, but I think we’ve come a long way in a very short time. I’ll tell you one thing about the [broadcast-IT] culture war, though: it’s easier to take a broadcast engineer and teach him IT than it is to take an IT guy and teach him video. On the one hand, this industry [is] adopting enterprise-IT workflow and infrastructure, but the IT industry better recognize that they have to be as responsive as the broadcast manufacturers or else the broadcast and the sports people will just go somewhere else. They’ll end up making bespoke hardware again [to] ensure that responsiveness.

This IBC is probably a reflection of the fact that it’s starting to tip over to the enterprise-IT side. Now that [Isilon] is part of Dell, we’ve got a whole lot of end-to-end technology, but we understand the market because of our roots in sports production and M&E. We’re trying to get people to model their workflows along the lines of data hygiene and data gravity.

Honestly, I think video is going to become just another data type in the core network. It’s a hefty data type, of course, but, [if] we can get to the point where we have a software-defined control plane that is separated from the data plane, the switching won’t matter. Whether it’s metadata, audio, video, or high-frame-rate video, it’s just another data type. And the sooner we get to that level, the sooner a lot of things will standardize and shake out and we can go back to the business of producing good content instead of worrying about these issues.

How do you see artificial intelligence factoring into all this? How do you see broadcast’s use of AI-based tools evolving in the coming years?
It’s truly the biggest trend of this show, for sure. Once you start realizing that your archive or media library is actually active and that you can engage with it, it opens up a whole new world of possibilities. Now that these archives are active, obviously, everybody wants to monetize them. But no one has the money to pay interns to watch 300,000 hours of content and to tag it properly. A bunch of companies, including Valossa from Finland and Google and the other web giants, are working on their video-content–aware and moving-image-content–aware technologies to avoid having to do that.

Anybody can thumbnail Photoshop stills; that’s easy. But going through video and figuring out that’s a logo, that’s a person, that’s a dog, this guy might be important because of the body language of the other people in the scene, that’s where you really get revolutionary. So artificial intelligence as a way to watch your media library for you is what I’m interested in. No, you’re not going get 100%-perfect metadata, but, if you get even 60% and it’s consistent and it’s tight, that is how you start monetizing your media library. By having a data lake where a separate process can chew through your media, it can start giving you taggable and actionable intel based on the metadata in the content.