Heath Firestone
Issue: May 1, 2008


Motion capture, or mocap as it is often referred to, is one of the great new frontiers in the world of movie making, and although it has had its resistance in the film community, it is becoming a crucial tool in complex digital effects. And, as it develops, it has become a completely different medium in which to capture entire feature films.
Robert Zemeckis has produced three films entirely using mocap: Polar Express, Beowulf and A Christmas Carol. Peter Jackson, James Cameron, Steven Spielberg and George Lucas have also used the technology in their productions.

Although some early attempts at motion capture yielded sometimes plastic-looking facial movements and dead eyes, the technology is continually improving and opening the doors to a future way of making films, which Steve Perlman of San Francisco’s Mova (, refers to as “Volumetric Cinematography.”


To describe mocap simply, it is a conversion of live-action movement into 3D data, which is used to drive 3D animated characters. How this is accomplished is not as simple to describe and spans the spectrum from single perspective pattern tracking interpolation, mechanical armature exoskeletons and radio frequency triangulation to the more commonly used multicamera optical marker-based triangulation systems. There are even a couple of system out there, which use gyro-based inertial sensors and sonic sensing using time of flight triangulation for positional information.

Mocap in the film industry, however, more commonly uses one of the passive optical-based systems, like those made by Vicon and Motion Analysis. This is starting to expand into more advanced systems, which use more than just tracked motion data, including capturing textures and cloth surfaces. Motion capture is also being used for realtime capture of virtual handheld camera movements, which Scott Gagain, executive producer at LA’s House of Moves (, says, gives the shots an organic feel, and can also be an invaluable tool for previsualization. (I will cover this topic in more detail in a later issue of Post, when I write about the future of motion capture.)


Motion capture has been around in various forms, for over a century if you count Eadweard Muybridge, who in 1878 filmed a series of 12 photographs showing a full stride of a horse to demonstrate that all four of the horse’s hooves leave the ground at one time, mid-stride at the point where all four legs are tucked under the body. While this isn’t 3D motion capture as we know it, it was the beginning of analysis of motion using high-speed photography, which is the basis for motion capture. Since then, motion has been captured and analyzed mostly for scientific and medical purposes, but more recently has found its way into film production.

Since films first started using 3D models, it has been the goal to create realistic motion for those models and speed up the workflow so the motion doesn’t have to be hand animated. The solution is to capture the movements of an actor in three-dimensional space and apply that to the 3D models… in other words, mocap.


Motion capture has been used to drive computer animated characters in movies, such as Lord of the Rings, The Mummy and King Kong, or as a way of enhancing and matching partial animations on live-action characters, which can also be seen in The Mummy. It has also been used in movies that have relied on the technology for the entire production, starting with Sinbad: Beyond the Veil of Mists, Final Fantasy, Polar Express, Beowulf and the upcoming A Christmas Carol. Programs like Massive also draw from libraries of mocap data for the animations that power their autonomous AI creatures in movies like The Lord of the Rings trilogy.


Since all of the films that have relied heavily on motion capture have used passive optical motion capture set-ups like the ViconMX, which has been used on all of Robert Zemeckis’s mocap films to date, I’m using Vicon’s workflow to describe how it works, although there are other solutions out there.

Motion capture takes place on a stage, called a volume, which has a number of monochromatic, high-speed, high-resolution cameras mounted all around the stage. These cameras are optimized for picking up a certain range of light, usually red or near infrared, and have a couple of hundred high-intensity LED lights packed around the camera lens. These LED lights surround the lens in order to act as the light source that illuminates the retroreflective markers affixed to the actor, which are the points that are being tracked in 3D. The actors generally wear tight-fitting body suits that have retroreflective markers (round balls covered in retroreflective material) velcroed to them, and/or retroreflective dots glued to an actor’s face. Retroreflective material reflects back to its source very brightly, because it is coated with millions of microscopic glass beads, which — through refraction and reflection — bounce light off of the back “lens” of the bead, back toward the source of the light. This causes the material to ‘light up’ brightly, even with low intensity lights. But for this to work the light has to be coming from the same angle as the lens you want it reflected back to.

Robin Pengelly, senior VP/manager of LA’s Vicon Entertainment (, explains that the cameras have improved, where they no longer need to capture in front of black backgrounds because their custom cameras use filtering to eliminate non IR lightwaves, so the camera sees a high contrast image where the retroreflective markers appear as white dots in a black environment.

(Near infrared LEDs are usually used for close-up, facial motion capture, primarily because they are not hard on the eyes, being barely visible, whereas, red LEDs are brighter from the camera’s perspective, and can be used for a lot longer distance motion captures, as on a full stage.)

The motion capture cameras are placed around the volume at different perspectives, then a calibration tool is placed and moved around within the volume. This gives the software the information it needs to determine where each camera is in the volume. When multiple cameras are used at different perspectives, and the software knows their location through calibration, the software has all of the information it needs to cross reference which dots it sees traveling in which direction from different camera angles and triangulate its three dimensional location in the volume. This process is similar in all volumetric mocap  techniques in that multiple perspectives are needed to determine three dimension positional information through triangulation.

More cameras are used when more detail is needed, or multiple subjects are being captured, in order to have better coverage, and less likelihood of occlusion of markers. Occlusion is when a marker is blocked from the view of any of the cameras, making it impossible to track the marker.

After the motion data has been captured, it goes through a stage called Cleaning, which is where the software filters and resamples the marker data. It handles occlusion either through software, which uses techniques including estimating position based on trajectory and velocity in the case of live capture, or in post it can look at the position it is in before it is occluded and after it reappears and create a path between the two. Or it may be manipulated manually, where an artist may modify the motion curve to make a more realistic movement or correct data that the software was unable to calculate correctly.

Solving is the stage where all the information from the cameras is run through the software and converted into a 3D representation of the marker points. When several hundred or thousand markers have been tracked, the resulting points look like a mass of points, which is referred to as a marker cloud.

These tracked points then need to be attached to the corresponding positions on a 3D skeleton, which is in turn, attached to a 3D character. This process is called Retargeting. In some cases, this will have been pre-rigged so a live version can be displayed using a tool such as Autodesk’s Motion Builder, which is software that converts the animated skeletal information into a live animation of a 3D character driven by the motion capture.


The world of motion capture has evolved a great deal over the past decade. It has gone from working on a dark, empty stage with dozens of bright red ring lights surrounding the stage — where the actor had to imagine the environment with a restricted number of simultaneous motion capture subjects — to a point where mocap-friendly sets are built with interactive props and facial motion data is being captured in the same take.

Set pieces are still usually just metal frames, but props are often actual objects that have been altered to be mocap friendly — made non-glossy. Ben Guthrie, motion capture shoot lead at San Rafael, CA’s ImageMovers Digital (www.imagemoversdigital. com), which is a Robert Zemeckis company, explains that although the intention of using mesh props was to be marker friendly, the mesh tended to break up the view of the reflective markers, making it difficult to track the center of the marker accurately. So for smaller props, they have shifted back to using more realistic, but marker-friendly props. The addition of interactive props, built stages and 3D models to use as references helps in the envisioning process, as actors no longer have to imagine everything, though they do have to look past their mocap suits and concentrate on performance. Fortunately actors seem to adapt quickly and seem to focus on playing off of the other actors and communicating with the director without the hindrance of normal production distractions.


Actors in any film production have several challenges in connecting with their character and being in the moment of the scene, despite the hot lights and dozens of people scurrying behind the scenes fixing makeup, adjusting lights, capturing audio, etc. On a mocap stage, or volume, they don’t have to deal with as many of these distractions but, at least for now, they generally don’t wear normal costumes and are instead on an open stage being bombarded with the light of several thousand near infrared lights, surrounded by hundreds of cameras, often while up to 170 retroreflective markers are affixed to their faces. Fortunately, the cameras and near infrared lights blend into the rigging, and LED lights only emit a dim red light, which can barely be seen. The stages have also evolved from dark Duvateen covered walls with bright red rings every few feet to bright white stages with nearly invisible light rings.

While they have the freedom of knowing they don’t have to exactly hit a specific mark, since this can easily be adjusted in post, they can concentrate on performance instead. Debbie Denise, VFX executive producer for Sony Pictures Imageworks (, explains, “Theater actors generally have an easier time adapting than method actors, as they are often less reliant on costume and environment to connect with their character.” Theater actors tend to be more used to working on a limited stage, playing to large audiences, while keeping the intimacy of the scene.

“They’ve already made that leap,” she explains. A lot of actors who haven’t yet worked with mocap are at first apprehensive, but, although it might seem unnatural to act in spandex clothing, with semitransparent props, opposite actors who are also wearing spandex, once they get used to it they often feel a freedom of being able to focus on the craft and don’t have to reshoot an otherwise perfect take because of a change that needs to be made to lighting, makeup or costume. In the world of mocap, everything else is in post. So if you got the performance you wanted, then it is captured and all other tweaks will be created in post.

In the case of movies, which are entirely shot in mocap, as is the direction Robert Zemeckis has chosen, I was curious how the shoot schedules compared to traditional filmmaking. Denise explains, “Shoot schedules are one half to one third the length of traditional shoot schedules, often with only 25 to 30 days of shooting scheduled, and these aren’t long days.”


It all began on a little film called Polar Express. Robert Zemeckis, no stranger to technology and pushing the envelope, delved into his first fully 3D motion capture film. It wasn’t the first, Final Fantasy and Sinbad preceded it, but it was considered groundbreaking for other reasons. Imageworks’ Denise explains that when they did the initial tests for Polar Express, mocap was done in two passes, one for full body capture and the other for facial capture. Facial capture was very limited in movement. Not satisfied with that solution, Zemeckis wanted to capture facial and body motion simultaneously and worked with Vicon to get a system going that would have that capability. On Polar Express, they had three different volumes. One was 10-foot-wide-by-10-foot-deep and 16-foot-tall. At first, they were only able to handle one or two actors at a time, but eventually were able to bump that up to three. They also had a 20-foot-by-20-foot stunt stage and a 30-foot-by-60-foot stage for doing crowd scenes.

Although Zemeckis didn’t direct Monster House, he was executive producer, and it built on what was learned from Polar Express. In Monster House, they were able to get away with less facial motion capture data because of the stylized nature of the film. They shot Monster House on a 20-foot-by-20-foot volume with 240 cameras.
When Beowulf came around, Sony Pictures Imageworks tested other RF-based tracking options, hoping to be able to use traditional costumes because they wouldn’t have to rely entirely on optical markers, but the technology wasn’t available in time for shooting. Instead, they bumped up to 240 cameras on a 25-foot-by-25-foot stage, capturing as many as 20 actors with up to 250 markers each, including facial motion markers on all 20 actors. That’s a ton of data.

On Zemeckis’s most recent film (A Christmas Carol, still in production), now under the ImageMovers banner and Disney, new technologies were used, which allowed them to break away from the difficulties associated with capturing that volume of markers and gave them the capability to capture each face individually. It also allowed them to scale back their camera count down to 100 on their 30-foot-by-60-foot stage. This new HMC (Helmet Mounted Camera) system uses four tiny cameras mounted to booms, which attach to a simple, lightweight helmet and use ink dots on the face instead of markers. Since the cameras are mounted low on the face, almost out of view of the normal range of perception, they are easy to ignore unless the actor is trying to touch his face or drink, etc. This along with other new technologies starting to be used in mocap feature film production will be discussed in more detail in a follow up article.


“The reality of motion capture is that there is no magic bullet, each system has limitations,” says Steve Boyd, co-producer at ImageMovers Digital, who has been involved with Polar Express, Beowulf and A Christmas Carol. “The Holy Grail would be if we were able to capture not only the body movements and facial animation, but also be able to use regular costumes with no markers on the face, and be able to do this outside, not restricted to a soundstage.” Ideally, we will be to be able to capture all of this information and have this part of the filmmaking process be transparent. To accomplish this would probably require a combination of technologies.” These technologies have come a long way, and each film pushes the envelope and builds on the experiences of the last film.

Although Zemeckis took some pretty harsh arrows from several critics on Polar Express, he weathered the criticism and has embraced and advanced the technology to a point where the quality has become good enough that other prominent directors have felt comfortable taking the plunge. This includes James Cameron, who has reportedly made the leap to fully digital, motion captured characters after seeing the work being done and determining that it was finally at a point where he could use it on Avatar. He is also using a system with a camera attached to his actors, which captures facial motion data similar to Vicon and ImageMover’s HMC.

The one caveat about using very realistic 3D faces, is that you have to steer clear of the “uncanny valley” no man’s land where people respond very negatively to a face that is realistic enough that people almost believe it is human but because of certain missing subtleties of movement make parts of the character look somehow not quite right. This is sometimes described as having dead eyes, or plastic-looking facial movement. Since the goal in filmmaking is to keep people involved in the story, distractions — even subconscious ones — need to be avoided. This means, either getting it right or backing off to nonrealistic characters, which was the approach taken on Monster House.

One of the most exciting new developments in this area, which is being used on films (though which ones, have not yet been announced), is Mova’s Contour Reality Capture, which Mova’s Steve Perlman describes as a capture system designed for a new form of filmmaking he refers to as Volumetric Cinematography. It is also a multicamera optical capture system, but rather than capturing points it captures the random patterns in sponged-on phosphorescent makeup applied to the actor’s face, from multiple angles, and is synced to be captured only when illuminated by flickering UV lights, which are timed to be offset from the flickering of white Kino Flo lights. Offset from this are multiple color cameras capturing the white light exposure in order to get texture images, which will be mapped back onto the 3D model for advanced realism.

What is truly different about this system is that it uses what Perlman refers to as Retrospective Vertex Tracking, which means that the tracked points are determined in post, and the number of tracked points can be scaled up or down — depending on need. Because it doesn’t use fixed dots, it allows for far greater potential point resolution than existing solutions, but for now stage size is still limited in scope.


Mocap has established a strong foothold in the industry and is gaining popularity and acceptance. It creates possibilities in filmmaking, which did not exist before, and with the use of virtual camera tracking and more advanced capture techniques, will change the way that many films are made.

In a follow-up article, I will talk about the process of how this is really changing filmmaking for directors like Zemeckis and the future of motion capture. It’s an exciting time for mocap, with advances being made daily.

Heath Firestone, a producer/director based in Denver, has a strong background in advanced 3D digital effects and compositing. Reach him at