Learning new things in media workflows

Posted By Tom Coughlin on November 15, 2017 08:45 am | Permalink
Artificial intelligence (AI) is just about at the peak of the Gartner hype curve, disillusionment will follow. But in the meantime, we can look at how in-depth analysis of unstructured data, like media and entertainment files, can lead to increased use of the content and more ways to monetize content. At the 2017 SMPTE Technical Conference there were keynotes and sessions focusing on how AI will improve media workflows.

AI in Media and Entertainment Workflows

Jeff Kember, from Google Cloud, gave a keynote talk on various types of AI technology applied in media workflows.   In particular he described how machine learning (ML) works and how it can be applied to M&E content. ML is a type of AI that is a way of solving problems without explicitly codifying the solution and can be applied to create systems that improve themselves over time. These approaches have reached the point where machines can now recognize images at least as well as humans.  And they can also do an excellent job of understanding human speech and translating from one language to another.

Machine learning can now be used to automatically find the optimal model to describe some data out of a large number of alternative models.

Google has made open source TensorFlow available on its cloud services as well as processors created to run ML models on TensorFlow (the Tensor Processing Unit or TPU). The TPU is a Google-designed custom ASIC. The first generation TPU has been in use for 16 months and the second generation TPU now allows 180 Teraflops per TPU. The number of Google directories containing ML Brain Models has grown exponentially since 2015 and is used in many Google products. Google offers ML perception services for things like speed and vision recognition as well as raw ML platform resources, tools, accelerators and libraries.

Jay Yogeshwar from Hitachi Vantara spoke about the digital transformation of media including the use of AI for media applications. These applications include video compression, non-linear editing and media asset management, as well as OTT ad-insertion and e-commerce. He referred to the use of ML tools in the media environment as "ML Orchestration." ML Orchestration leads from processing to insights.

Using AI approaches allows doing automated video analysis and tagging, enabling advanced search to improve media workflows, monetize archived content and reducing costs.  

Greg Taieb from Deluxe spoke about machine translation of timed text. Using cloud based infrastructure they use a bundle of AI features including audio, image and video recognition with specialized capabilities shown below.

Konstantin Wilms from AWS gave a talk on integrating artificial intelligence and machine learning technologies into cloud-based media workflows. The image below shows the Amazon Ai stack, platforms, frameworks and infrastructure.  

Using these services AWS said that AI can be used for acquisition pre-processing and optimization; auto-characterization and metadata augmentation for Digital Asset Management (DAM) and Archive; as well as various ways in the overall digital supply chain including extraction of identified content, celebrity detection and various filtering and quality control for content distribution.  In addition, various analytic tools can be used for sentiment detection and other uses of social networking tools.  These can be used for ad insertion and directing content to the most appropriate viewers.

Machine Learning to Improve Content Delivery

Michelle Munson spoke about an ML approach to content request routing. She gave a thorough overview of how supervised and unsupervised machine learning works, including some insights into the underlying mathematics used with a discussion of typical applications in M&E.  Almost all ML techniques rely on matrix algebra that use a "cost function" that mathematically describes the relative error between the prediction and the training data.  

Mathematical techniques are then used to drive the attributes of the cost function that minimize the predictive error (the cost). With the rise of special purpose processors (GPUs and TPUs) capable of doing these mathematical operations, available using public cloud services AI is being used for everything. ML can help provide classification as well as predictions.  Neural Networks (NN) imitate the human brain with layers of activation functions, each of which is derived from the prior layers.

The various types of AI can be used for different M&E functions. For instance, neural networks and deep learning networks can be used for automated video, audio and image recognition as discussed earlier. It can also be used for speech to text translation, dynamic custom content composition and high accuracy fingerprinting for security. Collaborative filtering can be used for personalized content recommendations and advanced optimization of resource selection for content distribution over unknown Internet connections. Classification and anomaly detection can help discover potential security breaches and the identification of faulty devices.  

She pointed out that even though there is a greater need for cost effective personalized and high quality real time content delivery that the industry relies on edge storage and caching models that are 20 years old. She believes that algorithms can be created to optimize content delivery that keeps track of client request fulfillment and use this as a score for the overlay path's segments to train the model and then used a trained model to identify the best network segments to route subsequent requests.

AI and Enhanced Compression

Thierry Fautier from Harmonic spoke on the use of ML for additional compression improvements for content delivery. He pointed out that we live in a network bandwidth constrained world even while content sizes are increasing with the growth in 4K an higher resolution, higher frame rates, 360-degree video and HDR content. This is driving the industry to develop even higher compression technologies than HEVC (H.265). 8K UHDTV2 (60p, 4:2:2, 10 bits) needs almost a 48 Gbps data rate.  Broadcast UHD-2 (8K) is estimated at 65.6 Mbps with VR 6DoF at over 1 Gbps.

Upcoming standards offer to increase compression efficiency further as shown below. The Alliance for Open Media proposal would improve compression efficiency by 40% by 2018 and the JVET proposal by 2020 will increase compression rates even further.   

By taking new approaches to improve high quality compression it may be possible to go beyond even the current JVET compression proposals. These new steps include pre-process the data, using higher capability cloud ML services for content aware encoding and using the increasing capability expected in processors to enable decoding and post processing.

Thierry believed that several approaches can be used to break the compression wall. These include elastic encoring that allows increasing the CPU resources when content becomes more difficult, ML to teach the encoder how to encode, coupling post processing to pre-processing via metadata and content awareness due to bit distributions based upon a pyscho-visual model.   He felt that these approaches could lead to a boost in HEVC from 34% (in 2016) to 70% by 2020. Using JVET with similar improvements he thought that an 83% boost in compression capability was possible by 2020. These improvements are outlines in the chart below.  Of course, changing decoding technology in consumer products to JVET or these enhanced JVET approaches will take some time as there are millions of HEVC chipsets in new consumer products. For these reasons, it will probably not be until 8K and VR content become more common in the 2020's that we will see these new compression technologies coming into common use.

AI and in particular machine learning and the more adaptive deep learning are moving into media and entertainment applications to serve many functions, several of which are related to digital storage and content retrieval and use. We think that these trends will continue and lead to new services built into digital storage systems.


Several years ago, in-line deduplication became a product differentiator in storage systems, particularly those using flash memory. Eliminating extra copies of data helped save storage space and increased effective storage utilization. Eventually all the competing products included this capability. With the explosion of unstructured data, like video files, in-line ingest and data processing using machine learning to create actionable metadata and content awareness could become the a storage product differentiator and eventually all successful storage systems and architectures will include this capability.

About the Author

Tom Coughlin, President, Coughlin Associates is a widely respected digital storage analyst as well as business and technology consultant.  He has over 37 years in the data storage industry with multiple engineering and management positions at high profile companies.  

Dr. Coughlin has many publications and six patents to his credit. Tom is also the author of Digital Storage in Consumer Electronics:  The Essential Guide, which was published by Newnes Press. Coughlin Associates provides market and technology analysis as well as Data Storage Technical and Business Consulting services. Tom publishes the Digital Storage Technology Newsletter, the Media and Entertainment Storage Report, the Emerging Non-Volatile Memory Report and other industry reports. Tom is also a regular contributor on digital storage for Forbes.com and other blogs. 

Tom is active with SMPTE, SNIA, the IEEE (he is past Director for IEEE Region 6 and active in the Consumer Electronics Society where he is chairman of the Future Directions Committee) and other professional organizations. Tom is the founder and organizer of the Annual Storage Visions Conference (www.storagevisions.com), a partner to the International Consumer Electronics Show, as well as the Creative Storage Conference (www.creativestorage.org). He has been the general chairman of the annual Flash Memory Summit, the world's largest independent storage event. He is a Senior member of the IEEE and a member of the Consultants Network of Silicon Valley (CNSV). For more information on Tom Coughlin and his publications go to www.tomcoughlin.com.