This article investigates options for realtime, high-quality remote collaboration, focusing on realtime review and approval, challenges and lessons learned from developing the ClearView product line at Sohonet.
Why remote collaboration?
Seeking out new filming locations to wow audiences is certainly not a new idea, but what has changed in recent times is the shift to more than just capturing sound and picture remotely from the traditional studio and facility infrastructure. Functions such as dailies creation, color grading and editorial now occur increasingly near or on-set. Visual effects creation now increasingly occurs in a different location to both the financing studio and filming locations, chasing tax incentives, breaking into new markets and taking advantage of the new breed of creative and technical talent worldwide.
This increasing geographical diversity welcomes new challenges in managing the process required to make a cohesive end product. As film, TV or commercial production progresses, remote collaboration technologies offer to help dispersed people communicate ideas to keep everyone working toward a common goal.
Remote review and approval answers questions that arise due to geographical separation. An executive from a financing studio might be asking how they can review progress with editorial and visual effects. A director might ask how they can sit in on an edit session to share ideas without flying 5,000 miles. Advertising agencies will be asking how they can review a spot with a colorist if they are in a different city.
What technologies exist?
The requirements of international enterprises in all industries has led to the development of established technologies such as screen sharing, video conferencing, messaging, telephony, and combinations of some or all of the above known as ‘unified communications’ (UC). Whilst standard enterprise technology has uses for team communication in M&E, the current technology does not satisfy the need to review and collaborate with high-quality picture and audio.
In broadcast, television backhaul and contribution feeds between studios and playout facilities utilize high-quality video but are high cost. This transport increasingly occurs over more standard Ethernet/IP networks. JP2K contribution feeds are now common in broadcast. JP2K is an interesting technology for productions as it offers the promise of high-quality and low-latency video. On the consumer side, networks and technology for play-out have led to many leaps forward in technology, but the implementations are too low quality for most production applications.
Now that content is almost always file-based, it can be sent like any other file given the right bandwidth. There is an obvious limitation here; interactivity would be hours or more likely a day behind.
Dailies services such as Pix, and Dax from Prime Focus Technologies, are able to serve static content by having the dailies creator upload video files. They also provide collaboration tools for feedback, but not in realtime. Cinesync has a realtime element, by facilitating the synchronization of playback, but users are required to send a file in advance so the content is static and the service does not define or facilitate talkback.
Audio-only solutions are more advanced. Using ISDN to transport audio for remote collaboration in sound recording for productions has been around for years, and more recently, solutions that work over modern Ethernet/IP networks have become established. Notably, Source Connect can replace ISDN lines.
PCoIP, developed by Teradici, dynamically switches between a variety of standard video protocols, and with the right network parameters, can provide a high-quality stream to a remote display as well as remote keyboard, mouse and tablet I/O. This technology happens to be commonplace in visual effects, particularly those with remote workstation infrastructure housed in other facilities or data centers. PCoIP can be used for remote collaboration, by utilizing thin or zero clients at collaborating locations. This gives a good solution for low latency applications, but with limited ability to guarantee video quality.
High-quality, realtime review and approval
In terms of technology, ClearView integrates three technologies. It takes its primary content feed from broadcast-specific systems, a high-quality JP2K transport stream encoded/decoded from standard HD-SDI, able to work at production resolutions and frame rates. Enterprise video conferencing (VC) technology is utilized for person-to-person interaction, in H.264 1080p at 60fps. This all works over custom low-loss, predictable latency networks provided by Sohonet.
What are the challenges of remote collaboration systems?
Remote collaboration systems present challenges for both the technology, and with providing a usable product to the end user — the service itself.
In terms of what bandwidth is required for high quality video, a starting point of reference would be single-link HD-SDI video for 4:2:2 HD video. At about 1.5Gbps per second, uncompressed transport can’t be achieved over 1Gbps Ethernet bearer, rather it requires a 10Gbps Ethernet bearer. Network price points and coverage for transport over 1Gbps Ethernet and portions of are more obtainable. Other points of reference are Dual-Link or 3G HD-SDI for 4:4:4 HD video at 3Gbps, 3D HD is 3Gbps (4:2:2) or 6Gbps (4:4:4). 4K requires at least 6Gbps. Therefore it’s very clear that compression is beneficial.
End-to-end system latency needs to be kept low to maintain interactivity, this needs to be well under a second, with a couple of hundred milliseconds or below providing minimal interference. One-way network latency within North America is typically under 30ms, and Los Angeles to London around 80ms. Most modern compression codecs such as MPEG2 and H.264 compress across frames (temporal compression) to take advantage of static or traceable elements in a moving image. However, 24fps video produces a frame every 40ms, so compression over a few frames becomes unacceptable. Spatial compression of each frame only is beneficial for low latency video systems.
Modern networks are generally non-perfect, designed to be very low loss but not zero loss. This is because correction in upper network layers is easier to implement than a perfect network, and therefore transmitting a perfect image requires compensating for lost data. Using TCP rather than UDP for transport would allow for this, however TCP requires time-out periods and re-transmits that can add a second or more to latency. A better approach is FEC (as defined in SMPTE 2022), which adds redundancy to correct corrupt data and is successful in correcting lost frames on a network with a low level of loss. To quantify the requirement for FEC, networks are typically sold with three, four, or five nines of packet delivery assurance. Four nines (99.99%) packet delivery SLA relates to one packet lost for every 10,000 sent. Single link HD-SDI JP2K uses around 60Mbps, which generates around 10,000 packets per second. If we have to throw away a video frame every time we lose a packet, it means a video frame a second lost. Five nines would mean a video frame lost every 10 seconds on average. Whilst packet loss typically does not occur at regular intervals, providers quote packet delivery SLAs to cover for small irregular loss. Thus, it is clear that FEC is required to maintain a high quality video signal.
Unfortunately, FEC can’t protect against higher levels of packet loss. Networks such as the Internet do not provide an end-to-end SLA, they are designed to carry protocols that can handle links with higher loss, and ultimately are designed to handle the predominant traffic streams such as Web browsing, file movements, and consumer video content which adapts to network conditions with a penalty in quality. Therefore networks with high SLAs should be utilized for high quality video collaboration — such as an appropriate MPLS service — wavelengths. ClearView runs over the low-loss predictable latency Sohonet network.
So, the technical challenges can be overcome with the right network, but the difficulty of building a usable service remain.
There are several challenges for a technical department, integrator or service provider in providing a solution that consistently works to a high standard. To understand these challenges, it’s useful to first consider the challenges that the actual users of such a system will have in making a session productive. Firstly, visual and audio interaction is important to communicating ideas. Imperfections quickly lead to frustration and a breakdown in the perception that they are in the same room. It’s also important for end users to know how the remote collaboration technology is affecting the content, such as compression or any differences in monitor calibration. Are these artifacts a result of the remote collaboration system or apparent in the source material? These challenges require support from a multi-disciplinary team, including video engineer, network engineer, and often projectionist. Other challenges include sessions usually occurring across time zones and engineers not being able to see directly the input/output of the remote side. Having an experienced service provider yields many benefits.
Real world examples
We were approached by a leading post production company in the US, looking to better utilize their creative talent, in this case an in-demand colorist working in the advertising space. Interactive sessions with the client are an important aspect of this type of process, however when working with agencies all over the US, they wanted to avoid as much downtime as possible involved with flying the colorist to the agencies for review. They had experimented with off-the-shelf video streaming technology and the Internet paired with Skype for talkback, but the limitations didn’t end up reducing the number of scenarios where travel was involved. The key areas we identified that could be improved with a ClearView system were firstly, reducing the latency of the primary video feed to align with the video conferencing feed, allowing client and artist conversation to flow as they comment live on the content. Secondly, providing color-correct JP2K image transport coupled with calibrated remote monitor to give confidence in the value of the sessions.
Another example was a leading US-based studio, where executives and post production supervisors were working on multiple simultaneous projects in far reaching locations around the US, Canada and Australia. Their aim was to review editorial content as interactively as possible. Again we saw similar motivations for looking to this type of technology, limiting travel downtime and having had limited success with other technology such as Web-based dailies services. The key elements of success with this project were providing the live element, dedicated reliable network bandwidth, and being able to utilize third-party installation and support resources.
Looking to the future
The same question applies as with all M&E technology: how much will consumer or prosumer technology catch up with that used in the latest productions? Additionally, as enterprise becomes increasingly global, how will remote collaboration advances in these technologies apply.
But ultimately, will all media after capture be ingested direct to the cloud, leaving everyone from artist to producer remote from the source? Document stores for enterprise now increasingly don’t exist at a head office with branch offices connecting in. In the cloud model, remote connections are inherent in the way the technology is designed as everyone is remote. Building blocks are being developed, as an example ‘desktop as a service’ (DaaS) services such as Amazon Workspaces are becoming established, and technology already exists for artist workstations to be remote from artist.
It’s clear that production will continue to be more global; studios have been increasing production and co-production substantially in Asia, and to a certain extent South America and Africa. Technologies that enable people to interact over distance, including review and approval of content whilst in production, are an important facilitator of this trend.
Martin Rushworth is the Director of Technology at Sohonet Inc. (www.sohonet.com) in Los Angeles.