Format Factor Fundamentals

Originally published in Videography magazine, May 2007

At last month’s National Association of Broadcasters (NAB) convention, a number of cameras drew interest. There were, for example, Grass Valley’s new version of Infinity, Hitachi’s HV-HD30, the i-movix SprintCam BC, Panasonic’s AG-HSC1U, the Red One, and Sony’s XDCAM EX. The AG-HSC1U uses 1/4-inch-format imagers, the HV-HD30 1/3-inch, XDCAM EX 1/2-inch, Infinity 2/3-inch, SprintCam about 1.5-inch, and the Red One a 35-mm movie-frame format, the biggest of them all. But does the format size make a difference in image quality?

Many factors contribute to image quality, and most of them have little or nothing to do with imager size. One of the longest waiting lines at NAB was to enter the Red Digital Cinema theater to see the results of a “camera test” shot in New Zealand two-and-a-half weeks before the show. It seems that every viewer found the results stunning, and many said it was the closest they had seen any electronically shot sequence come to traditional 35-mm movie quality, but that wasn’t necessarily because of the camera or its imager.

If it absolutely, positively has to look like film, why not shoot film?

Believe it or not, film is probably the only truly digital medium for capturing moving images. In film’s emulsion, photosensitive grains can be either exposed or not; there is no in-between condition. Shades of gray come from different sized grains that require different amounts of light for exposure. In all commercially available electronic cameras — even those said to be “all digital” — the photosensitive sensors generate an analog signal proportional to the amount of light hitting them. It is only later that the analog signal gets digitized.

Nevertheless, the individual sensors on an imaging chip are arrayed in a regular pattern; film’s grain is randomly distributed within a frame and from frame to frame. If a chip’s sensors have an uncompensated varying response to light, the result can be a pattern that appears in the picture and never changes, a so-called “fixed-pattern noise;” film never has a fixed pattern.

When HDTV cameras are sent into orbit on the space shuttle, they usually return with imaging chips damaged by cosmic radiation. The damage can show up as dots in fixed positions in the picture. Some of the dots can be eliminated by using an average of surrounding dots, but it’s simply not an issue with film; in film, each frame brings with it an entirely new photosensitive area.
Whatever maximum resolution a video camera has when built is the resolution it will always have; as film stocks improve, so can film’s resolution. Visible film-grain patterns aren’t equivalent to electronic noise. And all of those differences between film and electronic imaging don’t even take into consideration the color gamuts or gray-scale transfer characteristics of different film stocks. But none of that might have had anything to do with why viewers of the camera test in the Red Digital Cinema theater thought the results were so movie like.

There’s more to image acquisition than cameras and their imagers.

The New Zealand “test” turned out to be a short movie about a World War I battle, directed by Peter Jackson of Lord of the Rings fame, with dozens of actors in appropriate period costumes, trenches, an aerial dogfight, a vintage tank, explosions, smoke, dirt, makeup, music, and everything else one would expect in an epic production. Viewers would have probably appreciated it even had it been shot with a mobile phone, though probably not as much.

After writing, directing, acting, art direction, costumes, makeup, and the like, the next most important determinant of image quality is probably lighting. No camera — film or electronic — can introduce shadows that aren’t there. Then there are lens filters.

At NAB 2007, lens-filter manufacturer Tiffen introduced a software package said to closely simulate the effects of many of their filters, but, when they enticed showgoers to attend their presentations, they did so with a contest to win a physical UltraPol filter, the effects of which cannot properly be simulated with a software package. A shot through a cafe-window could show more exterior reflections than interior action without a polarizing filter like the UltraPol; with it, the reflections can disappear.

Other filtration effects may be more successfully simulated in software, but the results still might not match what a videographer can do in the field. Some have smeared petroleum jelly on the fronts of lenses for certain effects; others have stretched panty-hose fabric across the rears of lenses for other effects.

Camera settings affect shooting.

One area where software can very successfully emulate in post what a videographer might do in the field is in the area of camera settings. There are dozens of adjustments that a videographer can apply to the signals being processed in the camera to affect anything from color balance to contrast to how much facial blemishes show through.

As those settings control analog or digital electronic processors, as long as the complete signal from the camera imagers is made available in post, the same processing can be applied. In that way, electronic shooting can closely emulate the film tradition.

Cinematographers could deal with lighting, optical filtering, lens selection and settings, and film-stock selection (and sometime pre-processing, like “flashing”) when shooting; all other image processing had to be done in post. Grass Valley’s Viper brought the same concept to video, and newer digital-cinematography cameras have followed with “RAW” outputs. Writing of Dalsa’s cameras, cinematographer David Leitner noted that they have “two buttons only: on/off and start/stop.”

Unfortunately, there are two difficulties with that fix-it-in-post process. One is that the imagers capture more information than a typical video camera emits. So recorders need more capacity and need to be able to deal with higher data rates.

The other issue is associated with traditional television workflow, which is different from that used when shooting film. Based on image-processing camera settings, a videographer might increase or reduce the lens aperture, change lighting, or even modify makeup or costumes. Some videographers favor a mixed environment, wherein camera-processing settings determine the shooting conditions, but unprocessed data is sent to post. And then there’s color imaging.

Available lenses are determined in part by the color-separation system.

In early Technicolor, multiple strands of black-&-white film were exposed through different color filters. The rear focal distance of the shooting lens had to include the distance light had to travel through the color-separation system before it reached the film. A similar technique is used in most professional color video cameras. Light from the lens enters a color-separation prism system before hitting the imagers. Video lenses have rear focal distances that account for the path through the prism, and the prism optics are generally the limiting factor when it comes to the maximum system aperture.

Today, color film is shot using the tri-pack technique, wherein three layers of film, each sensitive to a different color range, are stacked atop one another. Foveon’s electronic image sensor uses a similar technique, but it is not yet used in professional videography.

Lenses intended for color motion-picture film do not have the long rear focal distance of video lenses and, therefore, cannot be used on prism-based electronic cameras without some form of image adaptor (relay optics, aerial image converter, etc.). The alternative is to use tiny color filters on the individual sensors of a single imager.

In the days when news footage was captured on black-&-white film, a process called Abtography placed a color-stripe filter in cameras and a matching filter in telecine projectors to deliver color. Early low-cost color video cameras also used stripe filters; so does the current Panavision Genesis. Other cameras use a checkerboard-like pattern of color filters, one form of which is called a Bayer pattern, with each two-by-two square of sensors having two diagonally-adjacent green filters, one red, and one blue (some Sony cameras have a pattern with six green filters for every red and blue). Cameras with single, color-filtered imagers can use film lenses without optical adaptors (as long as they have the appropriate mechanical mounts).

Every two imager-size formats share a “format factor.”

Silicon Imaging’s cameras use single, color-filtered imagers and have the appropriate mounts to use film lenses directly. But, when a film lens is used on one of their cameras, it provides different shot framing and different depth of field from what the same lens would deliver on a film camera. The image quality might be different, too; in fact, with some lenses, it could actually be higher mounted on the electronic camera than on a film-based one.

The so-called Academy aperture of a 35-mm movie-film frame has an image diagonal of about 27 mm; a Super 35 frame has an image diagonal of about 31 mm. Silicon Imaging’s cameras use a so-called 2/3-inch image sensor (named for the outside diameter of some old imaging tubes), which has an image diagonal of just 11 mm. The format factor from Super 35 to 2/3-inch, therefore, would be roughly 31/11 if both sizes had the same aspect ratio (they don’t).

When using a lens intended for Super 35 shooting, the Silicon Imaging chip uses only a portion of the lens image area. Shots will appear to be much tighter and will offer greater depth of field (more of the depth of the shot will appear to be in focus). If the lens has poorer performance away from its central axis (as many lenses do), the reduced imager size will use only the higher-quality portion of the lens image.

When two otherwise-equal electronic cameras of different imager size are compared, they will also have different sensitivities, dynamic ranges (from noise floor to peak white), and sharpness characteristics. Of course, it’s rare that any two cameras can be considered “otherwise equal.” Panasonic’s brand new AG-HPX500 and AJ-HPX3000 are both high definition, and both use 2/3-inch imagers, but the AG-HPX500 has a suggested price of about $14,000 and the AJ-HPX3000 about $48,000.

For equivalent shot size, divide the larger-format focal length by the format factor.

Can small-format cameras offer wide shots? That’s not a simple question to answer.
Consider the difference between 2/3-inch and 1/3-inch imaging chips. The former have an 11-mm diagonal and the latter 6, so the format factor (if both have the same aspect ratio) is 11/6 or roughly 1.83. A lens with an 18.3-mm focal length on a 2/3-inch-format camera, therefore, would provide the same size shot as a 10-mm lens (18.3/1.83) on a 1/3-inch camera.

Given the format factor, the appropriate focal length to match any 2/3-inch-format camera’s shot on a 1/3-inch-format camera can be easily calculated. Unfortunately, there’s more to videography than math.

It’s common to capture wide shots on a 2/3-inch-format camera using a 4.5-mm focal length. According to the format factor, the same wide shot can be captured by a 1/3-inch-format camera using roughly a 2.5-mm (4.5/1.83) focal length. Unfortunately, no one (yet) offers a 2.5-mm lens for 1/3-inch video cameras.

For equivalent exposure (ignoring certain issues), divide the larger-format f-stop by the format factor.

Although it’s an imperfect analogy, an individual picture element’s (pixel’s) sensor on an imaging chip may be considered a little photocell. If 3/4 of the area of the photocell is covered, its output will also be reduced by 3/4.

If all else is equal (and, again, it rarely is), the photosensitive area of a pixel on a 1/3-inch imaging chip is roughly a quarter of the area (1/1.83 squared) of that of a pixel on a 2/3-inch chip. But a square function is already built into f-stops, so equivalent exposure for a 1/3-inch imager (ignoring certain issues) may be calculated by dividing the exposure of the 2/3-inch imager by the format factor.

Thus, an exposure of f/18.3 in a 2/3-inch-format camera should be equivalent to an exposure of f/10 in a 1/3-inch-format camera. The same wide-end problem associated with shot size returns here. A 2/3-inch aperture of f/1.83 should be matched by a 1/3-inch f/1, but no 1/3-inch camera-lens combo offers that wide an aperture.

Then there are the ignored issues. The microlenses that improve sensitivity by focusing light on photosensitive areas of sensors work best at narrower apertures. And non-photosensitive (signal-handling) areas of imaging chips don’t necessarily shrink with the format factor.

Dynamic range is also related to sensor area. But all things are rarely equal. Spatial-offset and rotated-sensor designs can increase sensitivity and dynamic range in smaller formats by trading off other characteristics.

Between the hyperfocal and macro regions, depth of field is format-factor related.

When a lens is focused at a sufficiently far distance, a huge range of distances — out to infinity — can appear to be in focus; that’s the hyperfocal region. When a lens is focused on objects extremely close to it, special rules apply; that’s the macro region.

Between the hyperfocal and macro regions is the range of most shooting. When a lens is focused at some distance in that range, some objects both closer and farther will also appear to be in focus. The range from the closest to the farthest of those objects is the depth of field.
Depth of field is determined by a number of factors (including the “circle of confusion,” a circle so small it is indistinguishable from a point), many of which are format-factor related. The format-related factors cancel out, leaving just aperture and focal-length for comparing equal-resolution different-sized imagers.

So, for equivalent types of cameras (e.g., both 1080-line HDTV), the depth of field of a 2/3-inch format camera shooting at an 18.3-mm focal length with an aperture of f/18.3 should be matched by a 1/3-inch format shooting the same scene from the same distance at a 10-mm focal length and an aperture of f/10.

As before, the 1/3-inch camera cannot match a 2/3-inch camera shooting at a 4.5-mm focal length at f/1.83. It would need a roughly 2.5-mm focal length and an aperture of f/1, and neither is available. That’s not necessarily bad. In news shooting, for example, the greater depth of field of small formats can be a bonus.

It’s not the resolution; it’s the sharpness.

The next set of format-factor calculations requires an understanding of the difference between resolution and sharpness. Resolution is a physical phenomenon. It can be measured by counting sensors or calculating filter bandwidth, neither of which is something done by viewers. Sharpness, on the other hand, is a psychological phenomenon or, perhaps more accurately, a psychophysical one. It is a human response to a physical stimulus.

It, too, can be measured, but measuring it requires asking human subjects their responses to different stimuli. When that testing was performed, the results showed that the sensation of sharpness appeared to be proportional to the square of the area under a curve mapping contrast ratio against resolution.

Such curves (often called modulation-transfer function or MTF curves) tend to look like the right side of a bell-shaped curve. Most contrast is at zero resolution (e.g., it’s day or night; the light is on or off), and it falls off to a thin “toe” extending to higher resolutions.

Unfortunately, human vision has almost the opposite characteristics. It is most sensitive to minimal contrast not at zero resolution but at a relatively low fineness of detail. As detail gets finer, it takes more contrast for it to be seen.

The MTF curve delivers minimal contrast where human vision needs the maximum. Delivering maximum sharpness, therefore, is a matter of increasing the area under the MTF curve.

There’s no avoiding diffraction.

A relatively simple formula describes the MTF of light being diffracted by passing through an optical system with an input hole, like a camera with a lens. It is 1 – (1.22 x λ x f x lp/mm), where λ is the wavelength of light in millimeters, f is the f-stop, and lp/mm is the resolution in line pairs per millimeter.

The most important part of the formula is the beginning. The one-minus indicates that contrast drops anytime all factors within the parentheses are above zero.

The reason that many videographers aren’t accustomed to dealing with diffraction is that it rarely affected standard-definition (SD) shooting. In a 2/3-inch, SD, American-standard camera shooting 4:3-aspect-ratio images, the lp/mm is approximately 37, and, even in the worst case of red light, shooting at even as narrow an aperture as f/8, the contrast ratio would be about 77% of the maximum possible at the camera’s highest resolution. At wider apertures, the percentage would only increase.

In HDTV, things are different. Keeping red light, a 2/3-inch camera, and an aperture of f/8 but moving to 1080-line HDTV in a 16:9 aspect ratio, the contrast-ratio percentage drops to just 38% at maximum resolution. In a 1/3-inch camera, it would be zero — no contrast at all. To achieve the 38% of the 2/3-inch camera, the aperture of the 1/3-inch camera needs to be widened by the format factor to 8/1.83 or about f/4.4.

Smaller formats need better lenses.

An electronic imager has a fixed number of sensors. A lens does not have a fixed resolution. It simply has an MTF curve (or more than one, for different colors, different focal lengths, different apertures, and different focus settings). It provides greatest contrast at minimum resolution and reduced contrast as resolution increases.

So-called HD lenses may be better than their SD counterparts, but that doesn’t mean the SD lenses can’t pass HD resolutions. The HD lenses just offer more contrast at HD resolutions and, perhaps, reduced aberrations.

The horizontal axis of lens MTF curves is typically labeled in lp/mm. For full-resolution 1080-line HDTV, the lp/mm in a 2/3-inch format is exactly 100 lp/mm. For older, one-inch-format HDTV cameras, it was just 69 lp/mm. That may be one reason why those who worked with HD at the time say the older camera-lens combinations made sharper-looking pictures than today’s. On an MTF curve, 69 lp/mm offers higher contrast than 100. And the Vision Research Phantom HD camera, with a 35-mm-sized imager, comes out at just 40 lp/mm at full 1080-line resolution.

On the other hand, in a half-inch format, full-resolution 1080-line HDTV is 138 lp/mm. For a 1/3-inch format, it’s 183 lp/mm (the 2/3-inch number times the format factor). For a 1/4-inch format, it’s 245, about the same as the demands made on lenses by the NHK (Japan Broadcasting Corporation) ultra-definition TV system with 16 times more pixels than 1080-line HDTV!

All else is never equal.

If the only criterion for choosing a camera were how much sharpness it could get from a lens in HDTV format, everyone would be shooting with the Phantom HD. Of course, there are others.

For the cameras of Vision Research, i-movix, NAC, Photron, Weinberger, and others, an important criterion is high frame rate (high as in up to 250,000 frames per second). For many cameras from Grass Valley, Hitachi, Ikegami, JVC, Panasonic, Sony, and others, an important criterion is how well they play together in multi-camera systems. For Iconix, it’s having the smallest three-chip HDTV camera. For NHK, it’s having a camera that can make pictures detailed enough to look at on a movie-sized screen from a few feet away.

For ARRI, Colorspace, Dalsa, Panavision, Red, and others, it’s having an electronic camera that might take the place of a 35-mm motion-picture film camera, right down to the lenses. For Colorspace, JDC/Mitchell, Kinetta, Silicon Imaging, and others, it’s having an electronic camera that might take the place of a 16-mm motion-picture film camera, right down to the lenses.
None of the above cameras lend themselves to being carried around in the pocket of a news correspondent. NAB showgoers spent time and money buying tiny, inexpensive camcorders right off the floor from companies they’d probably never heard of before. Sanyo’s VPC-HD2 HDTV camcorder is currently selling for $649.99, and Canon’s PowerShot TX1, also capable of recording HDTV, is even less expensive.

Independent videographers might have yet other criteria in mind. No single value, not even the seemingly all-powerful format factor, can tell the whole story. But it helps.


Password must contain the following:

A lowercase letter

A capital (uppercase) letter

A number

Minimum 8 characters