The Rise of Metadata, Global Namespaces and Orchestrated Workflows
Jason Lohrey

The Rise of Metadata, Global Namespaces and Orchestrated Workflows

Contemporary content production and visual effects are equal parts art and process. Without art, the product would be boring. Without process, there would be no product. As the complexity underpinning high-fidelity content production increases (number of assets, number of people), and as we bring together talented people from around the globe to work together, then making the process efficient, lightweight and reliable will ensure the art is the predominant artefact. Metadata, global namespaces and orchestrated workflows are the three key enablers to make the process seamless.

Metadata is the hidden champion. It’s a bit like dark matter – unseen, but everywhere. It provides context and meaning to optimise the use and flows of data. 

There are several kinds of metadata: 

1. Technical Metadata is typically embedded within the data – for example, many images contain the aperture and ISO settings, GPS position and other helpful information about the camera settings at the time the image was captured. That information is important for quality control and our ability to recreate the same conditions used to capture an image. It can also be used to identify equipment or configuration faults. Whilst this form of metadata is interesting, it’s rarely used to control the flow of data. 

2. Contextual Metadata is automatically applied by the system or manually applied by people. It records the purpose, provenance and the intended usage of the data. For example, contextual metadata could include the identity of the project, the producer and identify the stage in a processing pipeline, which in turn helps determine the next step in the process. It could also include other information such as terms of use, licenses, dissemination caveats, access controls and a record of processing and transformations that have been applied to the data. Contextual metadata can be anything, can be added at any time, and will evolve — it should be sufficient to understand where data has come from, what has happened to it and where it is going. It is often unique to each business.

3. Analytical Metadata is generated by software that analyses the data. For example, the software might identify the broad composition, specific objects, or determine the sentiment of subjects, etc. This allows for higher-order discovery and grouping of data – for example, finding all images of beaches with waves being held up by offshore winds (good surfing locations).

While all these forms of metadata enrich and improve our ability to find and use data at scale, contextual metadata is typically the most used for orchestrating the flow of data.

People often avoid adding metadata because they believe it will burden their processes. However, it is possible to vastly increase the level of useful metadata without changing any processes as context can of often be inferred. For example, all data generated from some source (IP address) in a specific file directory is for ‘project’ P is deemed to be from “Project P” – the data orchestration system could also integrate with the task management system to extract and automatically add metadata from that system to a new created or modified file. 

A ‘global namespace’ is the combination of two or more geographically distributed locations such that data is equally accessible at every location. The namespace may be presented as a ‘global file system.’ Global namespaces ensure data is in the right place at the right time and that data is not being modified by more than one person at a time, no matter where those people are in the world. Global namespaces allow teams to work in the same or overlapping time zones, as well as facilitate ‘follow-the-sun’ production teams.

Light has a finite speed — it will take longer for information to be transmitted when the locations are further apart. For example, the latency between Sydney and Los Angeles is typically around 160 to 170 milliseconds. That is a large number compared to high-performance local storage that can have response times in the microseconds to a single-digit millisecond range. As a result, it is not really feasible to rely on a single metadata server in a geographically-distributed system, and data will need to be transmitted to each site so that it can be locally accessed.

Transmission ‘can’ be triggered when someone first accesses a file, or files can be transmitted ahead of time so that they are local when accessed. Sequentially fetching one file at a time is not feasible when many files are required, as each file will be affected by the inter-site latency. It is far better to transmit many files in parallel. In all cases, metadata is critical to determining what is to be transmitted and when that transmission should occur. For example, suppose I open a remote project file in a nonlinear editor. In that case, all the ‘referenced’ content should be transmitted (in parallel) and cached locally. This will require metadata describing the relationship between the project file and the content, and that metadata should be automatically maintained without human intervention. If data is to be transmitted ahead of time, then metadata is similarly required to identify which files should be transmitted and when.

Metadata can also control whether data can be transmitted to another site at all. For example, files may have restrictions on when and where data can be accessed. Those controls can be encapsulated in metadata and used by policies applied by the data orchestration systems. 

Data orchestration systems will use the metadata to decide on the routing of data according to business rules that are repeatable, optimised, automatic and audited — allowing people to spend less time on the infrastructure and more time on the art.

Joint solutions from Arcitecta and Dell Technologies deliver data where it’s needed at the right time. Arcitecta’s pioneering metadata and data orchestration tools, and Dell Technologies’ powerful, industry-trusted infrastructure, enable a global distributed edge that stays simple and performant, no matter the complexity of your workflows. 

Jason Lohrey is the CTO of Arcitecta (www.arcitecta.com), which has created its own comprehensive data management platform called Mediaflux.