Heath Firestone
Issue: September 1, 2008


In my last article, "The Reality of Motion Capture," in the May issue of Post, I talked about what motion capture is and how it works. In this article, I'm getting right at the meat of what technologies are, getting us closer to that goal, and what this really means for the future of not only motion capture filmmaking, which Steve Perlman of Mova (, refers to as volumetric cinematography, but also the impact it will have on live-action films and mixed media films, like James Cameron's upcoming Avatar. Mocap is finally coming of age, but the future will be even more exciting.

In the simplest terms, the future of mocap will be a seamlessly integrated motion capture experience where the limitations imposed by current technologies are overcome and the mocap process blends into the background. This will allow us to capture full body and facial motion data, as well as capture the movement and texture of real clothing, skin, props, and environments, while permitting realtime compositing or superimposing with live-action elements.

The idea is to create a mocap environment with all of the advantages of traditional filmmaking — with few of the drawbacks and incredible flexibility in post — not available in traditional cinematography. This is an ambitious though daunting order, but it may not be far off.


Traditionally, mocap has been viewed as the capture of body and facial motion, translated into animation that drives the motion of 3D characters. While this is still a very big part of mocap, in reality, it goes far beyond character animation. Motion capture often starts with capturing body and facial data, but in many cases, virtual cameras also simultaneously use motion capture to create a perspective and live preview of the actions being performed by the actors. In post production, additional camera motion tracking may be used to set up camera placement and movements. Since each element can be manipulated independently, replacing dialogue, or doing reshoots may only require mocap for one character since the existing mocap for all other characters likely does not need to be changed.


While directors like Robert Zemeckis have embraced mocap for every aspect of their filmmaking environment, others, like James Cameron have used it in conjunction with traditional live-action environments and actors. This creates a whole new set of challenges, but also opens up a lot of new opportunities for interactively combining live-action characters with mocap characters. Glenn Derry, virtual production supervisor on James Cameron's Avatar (December 2009) explains that although the movie is being filmed with the stereoscopic Pace Cameron Fusion camera system, and works with live-action characters, 75 percent of the movie is virtual. In order to achieve their filming style, they have had to develop a number of tools that allow them to mix live-action actors with mocap virtual characters, as well as combine traditional sets with virtual environments. For example, if they were filming a scene in which an avatar is interacting with an actor, they might be working on location, but using a motion capture set-up that combines an active optical marker tracking system, with inertial sensors, and camera tracking.

Cameron called on Atlanta- and LA-based Giant Studios (  for its proprietary, realtime mocap technology. The Avatar stage is in Playa Vista, CA.
Derry describes their shooting workflow as very director-centric, meaning they use a digital camera with a virtual eyepiece, which shows a live view of the combined superimposition of the live-action elements along with the virtual characters, which are being rendered and superimposed in realtime, as well as composited greenscreen elements, whose movements are matched to the camera movements based on realtime camera tracking. In other words, the director sees all of the elements of a shot combined in realtime, for a live preview. In this way, the director chooses his shots during filmmaking, which has the advantage of the actor acting for the camera. This can be helpful since an actor adjusts the size of his movements based on the framing of the camera. It is also necessary to use this approach when combining live-action characters since the view of them is fixed by the position of the camera while filming the live-action sequences.

In many ways, they are already achieving many of the goals of future mocap in that they have broken free from filming in a mocap volume. They have done this by combining multiple technologies, and have also succeeded in realtime preview of the superimposed images, making many aspects of the motion capture, just part of the background process.
what's new?

There are several new technologies that promise greater accuracy and resolution. Others offer better realtime preview and data management, while others offer the flexibility to do them anywhere. Some even approach the capture from a completely different perspective, and promise to capture the motion and texture of cloth and skin, while having scalable resolution options.


When I visited ImageMovers Digital ( and the set of Robert Zemeckis's A Christmas Carol this past April, I got to see all of their new toys, including their new facial motion capture system, the HMC or Helmet Mounted Camera set-up, which they developed with Vicon. The HMC is a system with four small cameras attached to small booms, placed low on the face, just below the view of normal visual perception. These are mounted to a lightweight skullcap helmet. This captures multiple perspective video of black dots marked on the actors' faces, which allows for individual facial motion tracking data to drive the facial motion of an individual character. This can significantly speed up the post process by requiring fewer cameras to capture body motion and keeps the facial motion data for each character separate. This allows for much more reasonable amounts of data to track, solve and retarget. This also helps with realtime preview since selectively, only the body motion has to be solved. It also is useful for early versions of scene renders since facial animation can be postponed until camera positioning and movement have been determined.

The HMC is not dependent on a capture volume, since the cameras go with the actor  and don't require any sort of fixed stage. This could ultimately be used to drive the facial movement of a character interacting on a normal set or location, assuming facial replacement is the only requirement.

The HMC isn't the only kid on the block, though. Glenn Derry has been using a single camera facial capture rig they call Headrigs, in Avatar, which predates the HMC. "It also employs a skullcap helmet, but uses a single camera mounted from the side like a headset microphone," he says.

While lacking the multipositional perspective for tracking, as well as the resolution of a four-camera system over a one camera system, it makes up for it with realtime facial motion solving and playback, which Derry explains is what is most important for their application, and provides them with the results they need in their streamlined, realtime workflow.


Mova's Steve Perlman has created a unique approach, using some new and existing technologies, which has some real potential to do things that haven't really been done before… at least not like this. The Contour Reality Capture system takes a bit of a step backwards, in that, for now, anyway, it places the actor in a very limited environment, similar to the earliest days of facial mocap where the actor had a very limited range of movements. From there, however, it takes a couple of big steps forward, and assuming they can increase the volume to a reasonable size, it could have a big impact on the mocap industry.

The system uses some tried and true concepts, but uses them in a unique way. Several high resolution black and white cameras set in sync with UV lights flickering 120 times a second to capture the phosphorescent glow of UV makeup applied to skin, and UV dye applied to clothing. These multiple perspective cameras will act as the motion capture sources, but the points that are being tracked haven't yet been determined.

Using a technique referred to as Retrospective Vertex Tracking, Mova's software uses pattern tracking of the random pattern made by the sponge application of the phosphorescent make-up on the skin. Since the points that are tracked aren't determined until post, different resolution solves can be made based on the need for more or less resolution, and greater resolution can be applied on areas of the face that need more tracked points. Because of the way it chooses tracked points, it has potential of having much higher resolution solves that traditional tracking marker systems.

At the same time, a UV image is being captured for motion capture. Several regular color cameras are set directly out of sync with the UV lights, but in sync with 5600K fluorescent bulbs. These effectively capture the texture map images which are projected back onto the 3D mesh, basically translating the video into a 3D video of the person's face, adding  some of the subtle details like minor wrinkles and semi transparent skin.
 These are things that don't register in mocap, but can be seen on video. The result is a 3D mocap, which looks very real. This also works with fabrics and uses no visible markers. 


While the ability to capture motion data outside of a mocap volume is critical for some applications, the mocap volumes still have big advantages in terms of accuracy and the ability to view multiple subjects from any angle. This is what still makes them the tool of choice for non-mixed media apps. With this in mind, those who use passive-optical-based mocap volumes are constantly improving and upping the ante with their stages. An example of this is Vicon's House of Moves (, which has just completed construction on their new mocap volume.

Their new stage can be configured in 30-x-50-feet for full body capture only, or 30-x-30-feet for full body, plus facial and finger capture. This stage differs from their old stage in that it is all white, with 270 near infrared cameras, which is more comfortable to work in since it is easier on the eyes. More importantly, however, was the design of the volume as a traditional soundstage in that it is sound proofed and designed to capture production grade audio, eliminating the need for most ADR.

The cameras are configured in two separate systems so that finger and facial capture is separate from the body motion capture. This allows Autodesk Motion Builder to be used to drive the animation of a virtual character in realtime. They have streamlined their workflow to timecode sync video, animation and audio, and have integrated custom virtual camera rigs with realtime preview to allow the director a tactile, interactive perspective. Hand-held virtual cameras add to the functionality. In addition, Vicon has integrated greenscreens in their volumes for intermixing live-action characters with motion capture characters.

Imagemovers Digital's mocap stage, which was built for A Christmas Carol, also uses near infrared lighting, but had a separate truss system built for wirework so the mocap cameras wouldn't be effected by truss movement. Instead of upping the camera count, they relied on 100 cameras, using the HMCs for facial motion capture. Lightstorm Entertainment in conjunction with Giant Studios also relies heavily on mocap volumes, using a 70-x-36-foot volume for performance capture for Avatar, and use a single camera facial capture system they call Headrigs.


One of the areas of motion capture, which is seeing a rapid increase in demand, has nothing to do with character animation and everything to do with camera placement and movement. This is camera tracking, which has actually been around for a while now, originally used for greenscreen apps so virtual cameras would match the live-action camera movements for compositing live-action characters into 3D environments. Now, however, some of the same systems, like Intersense's IS900 studio camera tracking system, are being used by directors like Zemeckis to set up camera angles and movements in post.

When it comes time in the post process to bring in the camera, Zemeckis breaks out his IS900, which he has hooked up to three machines. Then, one at a time, he uses the virtual camera to frame his shots and create camera movement. He moves from one computer to another so he can optimize his time, allowing each 3D artist to tweak the movements while he sets up the next shot. In this way, he knows that the shot he is working with is a perfect take, so he can focus all of his attention on getting the right angle, framing and movement. It is a critical part of his post workflow.

Traditional passive optical mocap stages all use specialized virtual camera rigs that act as a physical representation of a virtual camera with some camera and navigation controls. These feed a viewfinder playing back the virtual camera's view. Some directors prefer to set up their shots as they record the motion capture, whereas others prefer to focus on the capture and deal with the camera set-up details in post. Each has its advantages, and represents different directing and workflow styles and preferences.


Inertial tracking is another mocap technique that is seeing greater implementation. Inertial trackers generally work by having accelerometers and gyroscopes in them that detect motion in any direction along any axis. These provide all sorts of useful information. These sensors can be used independently, with inertial trackers placed at joints and other logical locations on a motion capture suit, like those used by Moven. Or they can provide supplemental information when used with optical systems to provide directional information to help with solving tracking points when they have been blocked from view. Intersense uses a combination of acoustic tracking and inertial trackers in their IS900 camera tracking system.

The advantage inertial sensors have over most other motion capture systems is that they can provide 3D movement data for each inertial sensor without having to be observed by outside sources, like cameras in optical systems or acoustic sensors in acoustic systems. They can be used under clothing, and in most shooting environments. The limitation is: they work well for body capture, but have no solution for facial capture and aren't practical for finger capture. Used in conjunction with other technologies, however, they have a lot of potential for use in mocap production.


In many ways, the techniques used by James Cameron on Avatar, now being employed by Weta Digital for Steven Spielberg and Peter Jackson's latest mocap film, Tintin, is a model of what can be expected to come. On Avatar, the approach is to shoot everything in the same way as you would a normal movie. What they have done, however, is mix in the mocap technologies and feed the characters and environments created in 3D, which are driven by the camera tracker and performance mocap, back into the viewfinder in a live environment in realtime.

The director knows what he is shooting and how it will all be composed when completed. In fact, as they are filming the movie, Cameron is cutting the film, so what is sent off to Weta has already been cut to the frame. This director-centric approach gives the director a very tangible view of what he is capturing. What is lost is the freedom that you have in a completely mocap production. In essence, each of these tools had been designed to best suit both the production style of the directors and, more importantly, work within the environment, restrictions, and requirements demanded. 


As it stands, in the world of mocap there is always a tug of war between increased accuracy and realtime performance. The greater the resolution and number of cameras, the more points can be captured with greater the accuracy of the points and the reduced incidents of occlusion. Unfortunately, this comes at the cost of realtime preview. The flexibility of shooting anywhere, especially outside of a soundstage or volume restricts the normal flexibility of 3D motion capture in that it often solves only the mocap from certain perspectives, which is fine when you are mixing with a traditional camera since its perspective is determined at the time of recording.

As processing gets faster and the tools, which combine multiple technologies, are developed, we can expect higher resolution while still providing realtime playback. There will also be a larger number of tools, and a larger pool of talent. The systems will be more flexible, quicker to set up, provide more accurate data, and provide more integrated workflows. 


As technical as motion capture is, ultimately you will hear the same thing from virtually all camps, "It's all about story." The purpose of these technologies is to allow us to create characters and effects that we could not do before, and do it more realistically and efficiently than ever before. They allow us to create and interact with virtual worlds and blend the lines between reality and fantasy. The tools are becoming more flexible, more powerful and are changing the way a lot of films are being made.

Heath Firestone is a producer and director for Firestone Studios LLC (, which specializes in mixed media 3D compositing and camera tracking.