The Habit and "The Hobbit"
Here are a couple of questions to get you started: What is the image at left? And what is the sound of a telephone call?
I’ll offer some more information about the first one. It’s an “intertitle,” the sort of thing inserted into silent movies to help advance their plot.
This one happens to be from a pretty famous movie. Got any idea yet of which one? You’re likely to be familiar with it even if you never saw it. But the answer might be surprising.
Now, how about that telephone call? Bell Labs researcher and audio pioneer Harvey Fletcher wanted its sound to be unidentifiable, i.e., just as good as being there. Today, if you use a certain type of mobile phone, you might be able to identify certain negative artifacts, but, in general, with contemporary technology, Fletcher’s dream has been achieved: a telephone call sounds pretty much like any other reproduction of an electronic audio signal. And that’s a problem.
When the kidnapper calls to demand ransom in a movie or TV thriller, the camera might offer a close-up of the person taking the call, but the kidnapper’s voice shouldn’t sound like it’s coming from the same room. So a voice filter is used, typically restricting the bandwidth of the sound to a range from roughly 300 Hz to 3 kHz as shown at the right in the Cisco white paper “Wideband Audio and IP Telephony” <http://bit.ly/116U1Mn>.
If you’re familiar with sampling theory, you know that, to avoid spurious frequencies known as aliases, sampling must be done at a rate higher than twice the desired highest frequency, and the signal must be filtered to prevent anything higher than that highest desired frequency from entering the sampler. Filters are imperfect, so, if a telephone company wanted to sample 8,000 times per second, it would not be totally unreasonable for the system to pass little more than 3 kHz.
Digital transmission systems don’t care about filtering low frequencies, however, so why the 300 Hz low-frequency cutoff? It dates back to analog transmission systems, wherein different frequencies would be attenuated by different amounts, and an equalizer would restore them. The attenuation might be described as a certain number of decibels per decade. A decade, in this case, is a tenfold increase in frequency, as from 300 Hz to 3 kHz. Going down to 30 Hz from 300 would add another decade, doubling the equalization needed.
Today, in the era of digital transmission, going down to 30 or even 20 Hz would not be a problem, which is why people describe today’s real-world telephone calls in such terms as “sounding like you’re next to me.” But the sound of a telephone-call voice in a movie or on TV still harks back to an earlier era (just as a print ad might tell its viewer to “dial” a certain phone number in an era when it’s hard to find a dial-equipped phone outside a museum).
It’s not easy on a visual web page to provide examples of telephone call sounds, especially since I have no idea what your listening equipment is like. But here is another common example of a motion-image-media indicator that strays from reality: the binoculars mask.
If you use binoculars, you probably know you’re supposed to adjust their eye separation so that there’s one circular image, not the lazy eight shown at left. But, if there’s no binoculars mask effect, how is a viewer supposed to know that the scene is seen through binoculars?
Now, perhaps, we can consider frame rate. Though he wanted telephone calls to sound just like being there in person, Fletcher did the research that identified the 300 Hz-to-3 kHz range for speech intelligibility and identification. Are there physical parameters affecting choice of frame rate? There are more than one.
One is typically called the fusion frequency, the frequency at which a sequence of individual pictures appears to be a motion picture. You can find your own fusion frequency with a common flip book; an 1886 version called a Kineograph is shown at right.
Flip through the pages slowly, and they are individual still pictures. Flip through them quickly, and they are a single motion picture.
Unfortunately, there is no single fusion frequency. It varies from person to person and with illumination, color, angle, and type of presentation.
The type of presentation becomes significant in another frame-rate variable: what’s commonly called the flicker frequency, the rate at which sources of illumination appear to be steady, rather than flickering.
Some of the earliest motion-picture systems took advantage of a fusion frequency generally lower than the flicker frequency. They presented motion pictures, but they flickered, thus an early nickname for movies: flickers or flicks.
One “solution” to the flicker problem was the use of a two-bladed shutter in the projector. A film image would be moved into place, the shutter would turn, the image would appear on screen, the shutter would turn again, the image would disappear, it would turn again, it would reappear, and it would turn again while a new image moved into place. The result was an illumination-repetition rate twice that of the frame rate, perhaps enough to achieve the flicker frequency, depending, again, on a number of viewing factors.
While the two-bladed (or, in some cases, three-bladed) shutter helped ameliorate flicker, it introduced a new artifact into motion presentation. A moving object would appear to move from one frame to another but to stall in mid-motion from one shutter opening to another. Clearly, that was a step away from reality, but, like a limited-bandwidth telephone call and a binoculars mask, it tended to indicate the look of a movie.
What rate is required? When Thomas Edison initially chose 46 frames per second (fps) for his Kinetoscope, he said it was because his research had showed that “the average human retina was capable of taking 45 or 46 photographs in a second and communicating them to the brain.” But the publication Electricity, in its June 6, 1891 issue, contrasted the Kinetoscope’s supposed 46 fps with Wordsworth Donisthorpe’s Kinesigraph’s six-to-eight: “Now, considering that the retina can retain an impression for 1/7 of a second, 8 photographs per second are sufficient for the purpose of reproduction and the remaining 38 are mere waste.”
Is there a “correct” frame rate? This week’s Super Bowl coverage made use of For-A’s FT-One cameras (above), which can shoot 4K images at up to 900 fps. But that was for replay analysis.
At the International Broadcasting Convention (IBC) in Amsterdam in 2008, the British Broadcasting Corporation (BBC) provided a demonstration in the European Broadcasting Union (EBU) “village” that showed how frame rates as high as 300 fps could be beneficial for real-time viewing. At left is a simulation of 50-fps (top) vs. 100-fps (bottom), showing a huge difference in dynamic resolution (detail in moving images).
Note that the stationary tracks and ties are equally sharp in both images. The moving train, however, is not. Other parts of the demonstration showed that high-definition resolution might appear no better than standard-definition for moving objects at common TV frame rates.
A clear case seemed to be made for frame rates higher than those normally used in television. Again, that was in 2008. In 2001, however, Kodak, Laser-Pacific, and Sony each won an engineering Emmy award for making possible 24-fps video–video at a lower frame rate than that normally used.
As the BBC/EBU demo at IBC clearly showed, 24-fps video has worse dynamic resolution than even normal TV frame rates, let alone higher ones. Yet 24-fps video has also been wildly successful. It provides a particular look, just as a binoculars mask does. In this case, the look contributes to a sensation that the sequence was shot on film. But why did movies end up at 24-fps? It’s not Edison’s 46 nor Donisthorpe’s 8.
The figure is based on research but not research into any form of visual perception. Go back to the intertitle at the top of this column. Have you guessed the movie yet? It’s The Jazz Singer, the one that ushered in the age of sound movies, even though, as the intertitle shows, it, itself, was not an all-singing, all-talking movie.
Some say 24-fps was chosen as the minimum frame rate that would provide sufficient sound quality. But The Jazz Singer, like many other sound movies, used a sound-reproduction system, Vitaphone, unrelated to the film rate: phonograph disks. In the 1926 demo photo above, engineer Edward B. Craft holds one of the 16-inch-diameter disks. Their size and rotational speed (33-1/3 rpm, the first time that speed had been used) were carefully chosen for sound quality and capacity, but they could have been synchronized to a projector running at any particular speed.
That was the key. Sound movies did not require 24-fps, but they required a single, standardized speed. The choice of that speed fell to Stanley Watkins, an employee of Western Electric, which developed the Vitaphone process. Watkins diligently undertook research. According to Scott Eyman’s book The Speed of Sound (Simon & Schuster 1997), he explained the process in 1961:
“What happened was that we got together with Warners’ chief projectionist and asked him how fast they ran the film in theaters. He told us it went at 80 to 90 feet per minute in the best first-run houses and in the small ones anything from 100 feet up, according to how many shows they wanted to get in during the day. After a little thought, we settled on 90 feet a minute [24-fps for 35 mm film] as a reasonable compromise.”
That’s it. That’s where 24-fps came from: no visual or acoustic testing, no scientific calculation, just a conversation between one projectionist, one engineer, and, according to Watkins’s daughter Barbara Witemeyer in a 2000 paper (“The Sound of Silents”), Sam Warner (of Warner Bros.) and Walter Rich, president of Vitaphone. After Vitaphone and Warner Bros., Fox adopted the speed, and soon it was ubiquitous.
Fluke or not, 24 fps came to symbolize the look of film, which is why 24-fps video is so popular. We have a habit of associating that rate with movies.
The Hobbit broke that habit. It is available in a 48-fps, so-called “HFR” (high-frame-rate) version. And its look has received some unusual reviews.
Some have complained of nausea. It’s conceivable that there is some artifact of the way The Hobbit has been projected in some theaters (in stereoscopic 3D) that triggers a queasiness response in some viewers, but it seems (to me) more likely that those viewers might be reacting to some overhead, spinning shots in the same way that viewers have reacted to roller-coaster shots in slower-frame-rate movies.
Others have complained of a news-like or video-like look that made it more difficult for them to suspend disbelief and get into the story. That’s certainly possible. If 24-fps contributes to the look of what we are in the habit of thinking of as a movie, then 48-fps is different.
Of course, we no longer watch flickering silent black-&-white movies with intertitles, projected at a rate faster than they were shot, either. Times change.