Discovering the Video Future through Metadata
Piksel, online video and software-as-a-service developer, is creating products and techniques in content discovery, search and recommendation that aim to improve competitiveness among service providers, broadcasters and other content owners. Believing that metadata based systems and processes are the best way forward in this area, they are showing how an advanced content metadata system can improve the user experience and help drive loyalty, differentiation and revenues for both broadcast and digital services.
First of all, because more video content is available to watch anytime, anywhere, through pay TV, online services, library content and familiar linear TV channels, it’s important to help viewers navigate through it all to find what they like, and then continue finding more of it.
Given that improving content discovery most likely leads to audience engagement and satisfaction, from there, customer retention and improved monetisation also become more likely. The task now for media companies is building better tools with which to explore their video libraries.
Kristan Bullett, Group Head of Architecture at Piksel, talked to Digital Media World about positive and very interesting changes his company sees happening in content discovery that are based on metadata improvements. Piksel says that this approach means compiling content catalogues that are better managed in the future, creating content descriptions and recommendations that are more accurate and useful, making searching much more granular. The impact will be to make TV more personal, without giving up opportunities to find something new and unexpected.
Poor metadata and metadata management potentially damage media businesses, and become a burden on resources. They allow viewers to encounter the same piece of content ingested multiple times, each time with its own metadata. In contrast, metadata processes that work result in consistent presentation and accurate descriptions, and allow descriptions to expand with more detail over time.
Service providers and content owners generally want to pull together their television and multiscreen delivery, but inferior content management can still lead to inconsistencies in what a viewer sees on different devices. However, if owners first design a single, master metadata file, the convergence of set-top box and OTT workflows can actually eliminate function duplication and reduce running costs.
The master file contains the different length synopses and the lower and higher resolution images that are needed to present content on different screen types so that each viewer sees one version of the content, which links them to the video file that is correct for their circumstances in terms of rights, device, location, language and so on.
Convergence aside, Kristan Bullett sees metadata handling evolving alongside the rest of the media industry. He said,”As the whole media processing pipeline evolves we are seeing faster migration to cloud-based systems, which bring operational efficiencies, speed and agility to development. This is what will transform the whole space. The idea that comprehensive metadata will reduce the need for convergence, we believe, will become irrelevant as companies start transforming their cloud platforms to take advantage of accurate, relevant, consistent metadata and management.”
As well as improving metadata quality, Piksel also believes it’s now possible for broadcasters and distributors to create new, unique metadata to improve discovery. For example, scene analysis, using visual identification and natural language processing of closed captions, can reveal who is present in a video file, and what they are doing, feeling and talking about. Companies can also develop their own sub-genres for searching and plot types to trigger recommendations.
Piksel identifies three main techniques for content owners to improve and take advantage of their metadata – consolidation, enrichment and augmentation.
Consolidation - Cleaning Up
Metadata consolidation, a process that aims to avoid creating different metadata files for the same content title is another reason for relying on a master file. Due to a lack of industry standardisation and different levels of maturity in how metadata is handled, metadata files can look very different depending on which studio, production company or broadcaster supplied them.
“Common metadata specifications, such as TV-Anytime, ADI (Asset Distribution Interface specification) or EBU, are very important and enable a more cohesive approach to metadata distribution across the whole distribution chain,” Kristan said. “But so far this hasn’t been extended to thematic or descriptive data. Piksel feels opportunities exist to supply this data to downstream systems through standardisation, which will ultimately lead to more satisfying customer experiences.”
Therefore, metadata consolidation involves a clean-up operation during ingest that ensures the master file can identify titles in a consistent manner. When a media company ingests content, the video files are usually transcoded and encrypted and prepared in the formats needed for each target device. The metadata file accompanying the video asset is analysed and matched against the existing metadata catalogue.
The content metadata system checks the content ID and will continue to compare different fields from the new metadata file until it can confidently determine whether a match exists or not. If a match is found, the new metadata can be ignored, parts of it can be merged into the existing metadata or the new file can replace the old. After the ID, string searches can be made against the title and then against the synopsis. Throughout, the consistentcy of the data is crucial.
The consolidation process also seeks to limit the amount of manual intervention that is required by human editors. As far as possible, it should become part of an automated workflow. Kristan said, “Automation gives an operator opportunities to not only reduce operating costs, but also greatly improve the timeliness and speed at which these tasks are carried out. Ultimately, due to the architecture that Piksel has in place, we can replace and update functionality without breaking any of the interfaces within the workflow engine.
“This means we can continue to improve the system without creating a negative impact on the consuming systems. Matching and other metadata cleansing and normalisation algorithms will be improved, both in terms of effectiveness and speed, and machine learning techniques are being introduced which enable a clearer understanding of the metadata. This will also introduce predictive analysis, further automating the metadata that is to generated more exciting results.
Making Metadata Work
The advantages of metadata consolidation for media companies are fairly straightforward. Catalogue management is easier, for example, and expensive manual handling can be minimised at the point of ingest. Replicating tasks in the broadcast linear and OTT environments happens less frequently.
By simplifying the metadata workflow, enterprises have more opportunity to create the conditions for ingesting more titles and expanding their content.
If metadata consistency, accuracy and completeness are monitored with care across every title, viewersare more likely to rely on their television provider for programme-related information and watch more titles. Consistency also helps services pursue the goal of ‘one service, all screens’ instead of trying to deliver both TV and multiscreen television.
Recognising that studios, production companies and broadcasters who supply content are not the only sources of metadata and programming information, providers are becoming interested in metadata enrichment, the process of importing data from third-parties to improve what exists already. A basic application would be running a comparative quality assurance check on metadata from a content supplier to ensure the release date for a movie is correct. An editor can also manually compare the supplied synopsis with the synopsis a third-party is offering.
Third-parties like IMDB and Rotten Tomatoes have supplied the broadcast industry with data for many years and now such companies sometimes license their data to video providers. It can be imported into the provider’s master metadata file and presented to a TV service’s customers. Other metadata fields can be edited using third-party services. Integrations between a metadata supplier and a video operator are fairly standard, established via APIs, and allow the data to be updated and refreshed over time.
Enrichment is carried out during metadata ingest, alongside consolidation, and it can also be automated and employ rules.
Augmented metadata, Piksel’s third technique for improving and using video metadata, is about taking advantage of a better understanding of what happens inside a movie or television programme. For example, video service providers can define content categories more precisely. Instead of presenting a movie as a ‘thriller’ it can become a ‘black-and-white thriller from the 1940s’. Many different adjectives can be introduced to help viewers find content that matches their taste and mood.
Metadata augmentation relies on the ability to generate new metadata by analysing the video scenes using image recognition and the closed-captions within content. Audio recognition could also be introduced, like identifying background noises such as crashing waves. Piksel is optimistic about the innovation occurring in this area , including machine learning, deep learning algorithms and semantic searching
Augmented metadata can be created as a programme is broadcast to extract information about who is in a scene and what they are talking about, among other details. Linear television including news, financial news and sport can be chaptered based on augmented metadata, creating on-demand files that reference identifiable topics of interest, with start and finish time-stamps. The resulting metadata file will reference all the different themes that emerged from a programme while it was on-air.
An ideal outcome would allow someone who is interested in ‘river pollution’, for example, to search under this term and find the different shows where it was discussed. Then they could link to the moments in the programme when this subject was mentioned, and also request an alert whenever is river pollution mentioned in new programmes.
Natural Language and Machine Learning
Closed-captions or subtitles can be found in most content and used as a transcript of conversations and descriptions of other sounds, which also becomes a source of information when analysing video scenes. Every video frame is also a still picture and a potential subject of image analysis techniques such as facial recognition and location analysis. A video service provider can decide to analyse every frame or analyse a frame every few seconds, depending on the level of accuracy needed for searching.
Scene analysis is the basis for deep-search capabilities, for example, looking for particular people or characters in specific locations, made possible by using a combination of facial and scene recognition. The use of subtitle analysis as well makes even more complex, precise search requests possible.
Piksel has been putting special effort into natural language processing, semantic search, image analysis and machine-learning systems in order to create proprietary metadata on a scene-by-scene basis. They find that machine learning, including deep learning, has grown more promising due to the computational power now available to run the process. This power makes it possible, for instance, to teach a recognition system how to identify an object, like a cat, in the same way that a child learns – by being exposed to countless examples of something that is a cat and something that is not. Relying on words alone is less reliable.
“A number of companies now are focused on selling and managing metadata,” noted Kristan. “We think it is logical for these companies to expand their scope to include augmented metadata. Having said that, there is also a market for replacement of existing metadata with new approaches to metadata capture and augmentation. As a result, a battle is likely between existing provider as they evolve vs new providers gaining market traction and very similar tools against the established providers.”
Another of Piksel’s research projects involved applying machine learning to 4,000 scripted television series from the last 15 years in order to learn about subject clusters and commonly associated words, in order to improve contextual understanding of videos. Themes are also at the foundation of metadata augmentation within Piksel’s own Fuse Metadata product. Any piece of content and every segment within that content can be ranked according to how high it scores against the different themes that were defined.
Because new metadata that a provider generates using proprietary algorithms is not generally available, using it to develop new content categorisations and lists of options creates a competitive differentiation for that provider.
Augmented metadata also enables fine-grained searching, that is, looking for content based on several criteria at once – characters, plus location, plus subject of dialogue and so on. Also, recommendations that take the plot and mood of content into account may feel more compelling and personalised than those based solely on past viewing habits.
Transforming content from linear television into segmented, or chaptered, on-demand assets can be automated to a greater degree, and new advertising and monetisation opportunities may be found owing to a better understanding of the context within shows. www.piksel.com