Cloud: Framestore creates VFX with preemptible machines
Issue: June 1, 2018

Cloud: Framestore creates VFX with preemptible machines

Framestore is one of the world’s leading visual effects studios. Nominated for numerous Academy Awards, the facility has won the Best Visual Effects Oscar three times, for its pioneering work on The Golden Compass (2007), Gravity (2013) and Blade Runner 2049 (2017). Today, the company coordinates a pool of more than 2,400 artists and producers across the globe, hundreds of which may work on a single project at one time. 

“One of the key factors in our success is our ability to scope, predict and plan for multiple, simultaneous, large shows,” says Steve MacPherson, CTO, VFX at Framestore. “Estimates around storage and compute are carefully compiled and resources are allocated, then we remain vigilant for the inevitable changes that come with any large, creative endeavour.”

A highly-organized, high-performance render farm is the engine at the heart of Framestore productions, delivering the processing power demanded by the company’s cutting-edge graphics. But as the company’s work became increasingly complex, MacPherson and his team saw the need for temporary spillover capacity to realize their ambitions and make projects possible.

Cost-effective capacity on demand with Preemptible VMs

For large visual effects companies, managing rendering costs is a key concern. High-performance servers deliver value for money when they run on a near-constant basis. As a result, provisioning for peak demand is a delicate balancing act, calculating the costs of temporary capacity against the expense of idle servers. 

“The purchase cycle for capital equipment can be a bit of a ball and chain,” says MacPherson. “On top of the initial cost of equipment, there is the ever-increasing demand for machine room space, as well as the ancillary costs of power and cooling. A trend toward tighter deadlines from film studios also lowered the tolerance for equipment failure.”

When the render farm reaches the limits of its capacity, Framestore faces a choice. “As a responsible company, there are two things we can do. Either we extend the deadline with the same resources, or we increase our resources. In the most direct way possible, Google Cloud enables us to extend our resources in a manner consistent with production ebb and flow.”

Framestore contacted Google to discuss how Google Cloud could meet their needs. “Often we go through a sales process with new technology partners, but right from the early conversations with Google we dug into networking, VPNs, security, and provisioning instances,” says MacPherson. “Google got the right people in the room, early on. To me, that’s the holy grail when working with a partner company. We established a high level of communication and trust directly with Google’s technical engineering community, and that sped up implementation dramatically.”

Thanks to that collaboration, it tooks less than a day for Framestore to create its first Google render node, from reading documentation, to building and deploying a virtual image and performing a proof-of-concept render. It rapidly became clear that Preemptible VMs would provide the best combination of capacity and cost efficiency. Using custom instances with specific core counts and memory configurations, Framestore can spin up additional capacity at speed, and spin it down fast when no longer required.

“With Preemptible VMs we can specifically target loads and create capacity to handle them,” says MacPherson. “Now all our cost models have a component that is based on Preemptible VMs. In three years, there have only been two occasions when we had to look for Preemptible VMs outside of our usual space.”

Direct-peering, disaster recovery and deadlines

Now Framestore is looking into direct peering through Cloud Interconnect, to reduce an unwieldy reliance on VPNs. “The network is one of the biggest restrictions to the scale we can operate on,” says MacPherson. “We have to build as many as 110 VPNs and make sure they stay up over multiple 10GB connections in order to maintain the bandwidth we need traversing a public network. With Cloud Interconnect, we can connect straight from our network to Google’s network, which would make our jobs much easier and further bulletproof our security model.”

A disaster recovery and business continuity (DR/BC) project based on Google Cloud is also in the pipeline, providing extra peace-of-mind and project security. “We want to be ready to build a fully functional and complete instance of Framestore in an instant; ready to scale according to network bandwidth,” says MacPherson. “Not only can that be part of our DR/BC strategy, it will also enable us to deploy for geographic expansion on short notice. It would improve our guarantees to clients when we have a deadline, and reassure the Framestore board of the company’s resilience.”

Striking a healthy balance

More than three years since they were first incorporated into the Framestore render farm, Preemptible VMs are an established element of the company’s business model. On a typical day during a peak delivery cycle, the company runs as many as 30,000 cores out of London alone, 12,000 of which are on Preemptible VMs.

“We're striking a healthy balance,” says MacPherson. “Not having to purchase hardware to satisfy peak demand has fundamentally changed how we approach our cap-ex budget and strategy planning. And having those tools at our disposal means we can be more confident in our ability to deliver to a constantly changing customer landscape.”