Ron DiCesare
Issue: March 1, 2008


At the heart of the audio industry is a community that shares its ideas, experiences and knowledge. All audio professionals have relied on the experience and guidance of others at some point during their careers, and this month, we spoke to four pros kind enough to share their experiences, tips and techniques with us.


Michael Kross, senior recording engineer at Universal City, CA’s Big Ear Audio (, shares how he does ISDN VO recording to picture in the fast-paced world of television promos. Since standard audio ISDN lines do not carry images, the talent is not able to see picture when recording remotely. Kross has developed a Digidesign Pro Tools template and uses a technique to record VO remotely in sync to picture while listening to sync sound, music and effects all at the same time. His template enables speed and efficiency while greatly reducing the room for error. His technique results in an ISDN session seeming more like a mix session of the finished spot rather than a simple voice recording. Gone is the need to record wild (without picture) and placing the VO after the fact. It can be done all at once, seamlessly.

The first step of this remarkable procedure is to understand and use his Pro Tools template divided in two sections (See illustration on page 33). The first section is for studio monitoring and is made up of five primary tracks. That is one track for VO recording, one for a scratch track, one for sync sound, one stereo track for M+E (music and effects) and one used as an audio cue track. The track layout is similar to how any VO record session might be done. The second section is the send section. It duplicates the five-track layout described above. Most importantly, all the audio on these secondary tracks must be offset (shifted) 14 frames earlier to compensate for the latency of ISDN during recording. The secondary tracks are sent to the ISDN lines for the talent while recording and are not heard in the studio due to the output assignments. It is similar to a cue mix or headphone feed, and because of the 14-frame offset, the result is recording in sync with the primary tracks and picture.

The template is dependent on the proper I/O set-up on Pro Tools and a mixer, like the Yamaha O2R96 used by Kross. The studio hears the primary set of tracks output to 1,2, 3 and 4 while the talent hears the secondary tracks output to 5,6, 7 and 8 routed to the ISDN lines using a Telos Zephyr.

The workflow of a typical promo session starts with loading in the Digi Beta with all the spots and a log of all the timings. “I’ll get a Digi Beta with four audio tracks on it. It has what we call SOT [Sound On Tape which is most often on-camera dialogue], a scratch track done by the producer for timing and a stereo mix of music and effects,” Kross explains.

“After digitizing, I duplicate all the audio by option clicking everything from the top set of tracks and pulling it down to the bottom set of tracks. It should mirror the top tracks. Next, I will offset the bottom set of tracks [the send tracks] to 14 frames earlier — 14 frames tends to be the amount across the board to compensate for the latency of the ISDN, even in Europe.

“I use an inactive track as a visual spacer on the template just to see what I am sending and what I am receiving,” he continues. “I like to just visually see it, that’s why I have no output assigned so it’s grayed out. Everything on the top is what we are listening to in the studio. Everything on the bottom below the visual spacer is what I am sending out to the talent.”

Next, he plays everything with the scratch track in for the talent so they can get a feel for the spot. “Once the talent is ready, I mute the scratch track on both top and bottom tracks. Otherwise, it might be muted in my studio, but still sending to the talent. So, both tracks need to be muted. Also, I have set up my Yamaha O2R so that the volume on my music is fairly low. That way, the music is not over powering, but he hears it as a guide to get the feel or the pace of the spot. I’m sending him all those elements; he’s reading it to that all timed out to picture.”

Kross’s template also includes pre-made audio regions that serve two functions. The regions start with a three-beep count down and have a duration of 64, 34, 24, etc., depending on the promo’s length. This allows for a three-second lead-in, the promo’s duration (60, 30, 20, etc.) and one second of post roll. The end cue is signified with a single beep. These regions are placed on the beep tracks of the template three seconds prior to the start of picture. So, if start of picture is at 01:00:00:00, it is placed at 00:59:57:00.

There are two benefits to these regions. First, they include a pre-made audio three-beep lead in, or count down, to cue the talent to start. This is similar to how ADR is done. The second benefit is that they serve as a quick and easy in and out marker for Pro Tools. By selecting and placing one of these pre-made regions properly, Pro Tools can record in/out simply by highlighting the region and pressing record.

“Now, I have these bars [regions] and I don’t have to set the in and out points; it’s automatic. I pop that bar up there, I highlight it, and that’s my in and my out perfectly. I just made that out of laziness,” he jokes. But the real benefit is having the speed and accuracy needed for the quick turn around time.
“After that,” he says, “I hit record making sure I have one second of pre-roll set on Pro Tools. Otherwise, I will not play the full cue because of the latency. Then, the talent will hear the three-beep cue and will come right in at the first frame of picture. Then the talent does his stuff. We often record it all the way it through, right to the end beep. And I always record him onto the top track VO, input number one, which is where my input is coming from my ISDN.”

Because of the dual-track set-up, playback for the talent is not done off the primary track that was just recorded. Playback must be done with the secondary send tracks below with the 14-frame offset thus avoiding the delay.

“Playback for the talent is done by immediately grabbing the piece that was recorded on VO track one, and option clicking it [copying it] down to the ISDN VO track, which is why I have the VO track down there as well,” he says. “Then I quickly offset it 14 frames. If playback is needed only in the studio, this step isn’t necessary, but it’s rude not to include the talent in the playback.”

This workflow also allows them to punch-in while viewing picture when doing a pick-up line. Since the process compensates for the latency and is so accurate, Kross can punch-in on a syllable anywhere needed.


Geoffrey Rubay, supervising sound editor/sound designer/mixer at Soundelux (, part of Ascent Media Creative Sound Services located in Hollywood, shares his experience of creating sound effects and recording techniques that are simple, yet innovative. His advice doesn’t require anything out of the ordinary or expensive equipment and can be used quickly. He explains, “When we record anything for the sound effects library or an effect for a film, we are armed to the teeth with a huge selection of microphones, but, when you have two channels to record to [stereo LR] when making a sound effect, stereo may not be the best way to record it.”

For example, he says, “If you are recording something being hit, rather than recording with two of the same mics in stereo, try using a condenser mic and a dynamic mic. Or, try recording with a condenser mic and a contact mic. For contact mics, I recommend Cold Gold contact microphones to get amazingly interesting results. So, if you are going to hit something, try putting the contact mic on the thing you are hitting, or switch it over to the thing you are hitting it with. The two mics combined provide something way more interesting than just regular stereo recording. It’s a lot of fun and it inspires a lot of creativity because you have now switched from an acoustic event to a vibration event.”

Rubay recommends recording any sound effect at a minimum of 96K using a portable device such as the Zoom H4. ”Try recording at half speed, quarter speed or double speed,” he says. “You will get amazing sounds, things that are ‘other worldly,’ and you’ll get there really fast. The key is using two different microphones going to the L+R channels. But, if all you have are two matched microphones try pointing one at the object you’re recording and the other at the near-by wall. Or, put it underneath the table or do anything other than record two of the same thing because what you get will be so much more interesting.”

Rubay uses this technique on everything he does, including a recent documentary on a rock band from Toronto called, Anvil, The Story of Anvil. “I needed to fix a piece of dialogue and match it from the production sound,” he says. “Rather than make a great recording of [the band member] in a studio, we recorded him at the director’s house. We used two mics, one pointed at his mouth and the other at the wall. Well, the one pointed at the wall matched the sound of the production dialogue without treating it in anyway.”

Another technique Rubay uses is called “Worldizing.” This is the process of playing back a sound with a speaker in an acoustic space while using a mic to record it in that space. “People just don’t think of it,” he explains. “Anyone can do it and you don’t need a big budget or fancy recording equipment. Any playback device with any recording device can be used to get interesting results.

“There are two techniques you can do,” he continues. “The first is where the playback device is the important element, like a speakerphone, a walkie-talkie, a public address system, etc. Here, the transducer is the important factor used to alter the sound and the mic is just used to capture it.

“The second is where playback is done as high-fidelity as possible, and the important element is the space you are playing it in. On the movie Stranger Than Fiction, they wanted music to sound like it was playing inside a bakery, so I took the songs the director wanted and recorded them through speakers in a similar room with nice microphones. When I put that up in the mix, I didn’t have to do anything else to it to achieve the desired result and to get that reverberant quality of the space.”

Worldizing any sound, even dialogue recoded via ADR, can be done in a variety of ways. It can be done using a Digidesign Mbox running on a laptop and playing back through powered speakers while recording it back to your laptop with any microphones. Or, it can be done using a boombox with a CD player to play back any sound anywhere while recording it with any portable recording device like the Samson Zoom H4. “It’s just using what you already have in a different way,” says Rubay. “You’ll have sounds that are really believable and drop right in to a mix with very little, if any, treatment. It doesn’t involve any plug-ins or expensive reverbs.”

Rubay also shares an interesting way to break a creative block when doing sound design. For any sound effects search, he uses Sound Ideas Soundminer and suggests using a random search mode known as the roulette wheel. “Today, everything is organized and categorized on a drive and a lot of decisions have been pre-made about notating and finding sound effects,” he explains. “If you remove all of that and just play sounds randomly, it opens up ideas you would never have otherwise. Just hit the roulette wheel and see what randomly pops up. It instantly adds clarity because it does one of two things: either you are going to hear something totally interesting that you would have never thought of otherwise, or you hear a sound that is completely wrong, thus getting you a little closer to knowing what is right. If you just come up with anyway to randomize the sounds you hear, it can open up new ideas.”


Ever since the FCC mandate for commercial television stations to broadcast DTV by the year 2009, there has been an exponential rise and demand for multichannel (5.1) mixes for television commercials year after year.

“The rise has been so dramatic that almost all national television spots I see are finished in HD and are expecting 5.1 mixes,” says mixer Mitch Dorf from POP Sound ( in Santa Monica. “These spots are now commonly finished on D5 or HDCAM SR tape, and since D5 has eight available audio tracks and HDCAM SR has 12, each format is capable of handling the delivery of both mutlichannel as well as standard two-channel [stereo] mixes simultaneously.”

Dorf observes that many production companies want to deliver 5.1 mixes but have been a bit cautious about the potential extra time involved in creating the mix. However, if done properly, a 5.1 mix for a television commercial shouldn’t take much longer to achieve than a standard stereo mix. Often, the two can easily and most certainly be done simultaneously. “I take the 5.1 print master tracks and feed them through a surround sound encoder [Dolby DP563] or surround sound plug-in to create the stereo encoded Lt-Rt [Left total – Right total],” explains Dorf. “The Lt-Rt can then be monitored and compared as well. Most monitoring systems should have a way to switch between the 5.1 and Stereo Lt-Rt. However, if you find this is not the case it usually can be accomplished with a little creative patching.”

He advises to be careful with the use of the LFE track or subwoofer. Using it too much might cause a huge difference between the 5.1 and stereo mix and can potentially send your stereo levels through the roof. Dorf explains the dangers: “I think of it for what LFE stands for, Low Frequency Effects. A subwoofer is a speaker. The LFE is a discrete audio track that is sent to that speaker. You can’t put your hand on an LFE, where I can definitely hold a subwoofer.

“When you use your LFE, it’s important not to send the entire mix to it. Sending all elements of the mix can really create a big mess. It’s important to note that in the home environment, many people have their surrounds at +10 and the subwoofer set too high because it sounds cool. So, if you send everything in your mix to the subwoofer, it just becomes mush. It’s very important to be aware of that and to use it as an effect when mixing. That’s why I have seen some 5.1 mixes that don’t translate well, because the LFE was used too much.”

What’s Dorf’s approach to a 5.1 mix? “The first thing in creating a 5.1 commercial mix is to realize that a commercial spot is basically a :30 mini-movie, most often driven by voice content. The addition of a three-dimensional sound design in all channels can possibly detract from the intent of the commercial. It is often best, and sometimes required by networks, to only have the dialogue/voiceover elements in the center channel.

“By dedicating the center channel to only dialogue and voiceover you not only have greater control and less audio ‘competition’ but you are effectively creating a ‘mix-minus’ track as well with the other five channels. You can see how this could work really well for retagging or international dubbing.”
In explaining the significance of proper levels for broadcast, Dorf says, “It is very important that the correct levels are delivered in your final mix. Just because there is 20db of headroom above 0VU doesn’t mean it can all be used. DTV content eventually gets converted into an AC-3 bit stream that, along with the audio, also carries metadata.

“The one parameter that is of most concern when mixing is called Dialog Normalization or ‘dialnorm,’” he continues. “Within the metadata all digital broadcast content receives a Dialnorm value. This value is derived from a Dolby LM100 meter and is encoded along with the AC-3 bit stream. The standard dialnorm value for television commercials is around -24 to -27. This level cannot be measured by a standard digital peak meter and can only come from the LM100. However, if you do not have an LM100 meter to check your dialnorm value, feed your simultaneously created Lt-Rt into a standard analog VU meter. If you see the dialogue bouncing around -3 to 0VU you can be pretty sure you are in the ballpark.”

Summing up, Dorf says, “Even at this point when we are delivering a lot of 5.1 mixes, the stereo mix is heard by the majority of people, whether they have high def TV or not. And if they have HD, chances are they are using the speakers on the HD TV most of the time and not a separate 5.1 set-up. That’s why I always shoot my 5.1 mixes folded down on to a stereo TV in my studio to see how it translates and compare the two. I do not want to compromise my 5.1 mix and want to use its full impact, but I also want to balance the two so that they are companions of each other. That’s how I approach commercial mixing, to keep that balance.”


Bob Pomann, owner/creative director of NYC’S Pomann Sound (www.pomannsound. com), shares his techniques and tools of his latest animated project, Turok. The film’s production was located in LA while Pomann worked in New York. The team relied on technology to get the job done, working under the studio-without-walls theory. One such technology was CineSync from Rising Sun Research. CineSync is a remote reviewing tool that allows multiple users to review visual media on their computers simultaneously, anywhere in the world.

“It is a system that allows up to 10 different people to watch a video at the same time,” explains Pomann. “We all had the same QuickTime on our desktops and anyone could hit play or stop and we all could see it. So, we could have a meeting about the show and we did not have to be in the same place. It’s like a spotting session that takes place in 10 different places at once. If that didn’t exist, then I probably would not have been able to work on this job.”

Another technology they used was Source-Connect from Source Elements. Source-Connect enables audio connections between digital audio systems anywhere in the world. It allows for recording directly to the timeline in realtime using only T1, cable or DSL Internet connections. “It’s basically ISDN done through the Internet with a plug-in on Pro Tools,” he explains.

When it comes to altering and pitching voices for animated characters, Pomann prefers using an Eventide harmonizer verses plug-ins. “You can’t tell for a second that you are using a mono pitch shift. The pitch control on that is really amazing. Like for computer voices, there is this one setting called short wave. It’s has a kind of human element to it but it’s effected.”

Pomann shares his practical approach to certain aspects of animation. “Footsteps are usually not done via Foley. It’s quicker to edit them. Since animation tends to use the same cycles over again, it’s quicker to have pre-made foot-step cycles ahead of time, like slow, medium and fast. The same works for ambiences which are easily picked up since backgrounds are often reused, too.”

Another way technology broke down studio walls was using ServerSound from Msoft. ServerSound consists of a server that is pre-loaded with the user’s libraries and accessed via local network or the Internet. The system can be shared within a single facility or accessed remotely around the world. This allowed Pomann to access his 20TB sound effects library in New York from LA where he mixed Turok on a Pro Tools Icon.

Have tips and techniques to share? Email for consideration.