New Year's Resolution


originally published in Videography January 2006

Numbers are peculiar. Most people would consider the question “Which is faster, a house or a statue?” unanswerable. “Which is heavier, an apple or an orange?” isn’t much better. Which is cooler, water or air? Which is smarter, green or purple? Which is tastier, Arizona or Iowa? None of those makes sense. But “Which is better, 1920 x 1080 or 1024 x 768?” elicits a stronger response — perhaps for no good reason.

Welcome to the world of resolution, a world of numbers and misperceptions. It’s a world where most people think they understand the meaning of certain numbers, but few do.

Pick up a brochure for a new TV, and under resolution you might see that 1920 x 1080. A brochure for an analog-to-digital converter might offer 14-bit. One for a lens might say 200 lp/mm (line pairs per millimeter). A manual for a waveform monitor might discuss resolution in IRE units and one for a vectorscope in degrees. Some computer-graphics systems use dpi (dots per inch). Resolution of videotape recorders was sometimes specified in TV lines (having nothing to do with scanning lines). Resolution of a motion-analysis system might be measured in frames per second. That of the human visual system might be described in arc minutes, percent, or milliseconds. That of a digital camera is typically announced in megapixels. Then there are Delta-E* figures in colorimetry.

Looking up the word resolution in a dictionary is unlikely to be of much help. The closest the “bible” of American English, Webster’s Third New International Dictionary, comes to what videographers usually refer to as resolution is resolving power.

Perhaps the best way to understand resolution is to consider a common specification seeming to have nothing to do with the term. FM radio tuners are said to have a certain selectivity, the ability to distinguish one station from another and tune it in.

Resolution, as it’s commonly used, is a form of selectivity. It’s the ability to distinguish one thing from another and treat it separately. But what is the thing?

Consider shades of gray. Anyone can distinguish white from black, and, except for certain optical illusions, can properly distinguish either one of those from gray.

There are light grays and dark grays and medium grays. How many grays can a human being distinguish? That depends on a lot of factors, but over much of the range of human vision, a person can distinguish a gray that is about 1% brighter or darker than another.

A digital system converts a continuous range, like that of human light sensitivity, to individual numbers. Each bit of an analog-to-digital converter multiplies the range of numbers by a factor of two. Thus, a one-bit system can offer only two numbers: zero and one. A two-bit system offers four: zero, one, two, and three. An eight-bit system offers 256. Each number can represent a level of brightness.

Suppose an imaging system translates light to numbers linearly in eight bits, with black at zero, white at 255, and grays in between. Around level 100, the system will perfectly match human visual resolution of grays. A person should be able to distinguish the difference between level 100 and level 99 or level 101, one percent off.

Around level 250, however, the bits seem to be wasted. A person can’t distinguish the difference between level 250 and 251; they’re only 0.4% apart.

Worse, around level 10, there’s a huge lack of bits. The difference between level 10 and level 11 is 10%, ten times more than vision can distinguish. The difference between level 1 and level 2 is an even worse 100%!

One solution is to use more bits. A 14-bit system offers 16,384 levels instead of just 256. There are more than enough (64) to resolve all the shades of gray between the eight-bit levels 10 and 11 but not quite enough to distinguish all the levels of gray between eight-bit levels 1 and 2. An alternative is to use a form of analog-to-digital conversion that deals with the signal non-linearly, applying more bits to the dark areas and fewer to the light.

Then there’s color. It’s easy for a human being with normal color vision to distinguish between a cyan (blue-green) and a yellow of equal intensity, but how about distinguishing a slightly bluer cyan from a slightly greener one or a slightly more saturated yellow from a slightly less saturated one? Delta-E* is one way of measuring the ability to resolve different colors of the same intensity.
As videography deals with colors and shades of gray, both of those forms of resolution are important. So is temporal resolution, measured in frames per second. But, in our industry, the generic term resolution seems to apply most often to spatial resolution, the ability to distinguish fine points of detail in a picture.

Even here, there are many different types of resolution. The U.S. color encoding system NTSC (National Television System Committee) offers maximum ability to distinguish fine spatial detail in non-color brightness, less along a reddish-cyanish axis of color, and least along a bluish-yellowish axis. That’s because humans are less selective of spatial detail in color. The IRE units and degrees on waveform monitors and vectorscopes are ways of resolving how far a system is from the NTSC norm.

Then there’s dynamic spatial resolution, the amount of detail that can be distinguished in images that move across our visual fields. As with color spatial resolution vs. brightness spatial resolution, humans are less selective of dynamic spatial detail than they are of static. Unfortunately, eye tracking can convert a moving image to a stationary one and vice versa. If a viewer follows a ball player’s image across a screen, that’s the part of the picture that needs the most spatial resolution.

There are also horizontal resolution (sometimes measured in TV lines per picture height, a TV line being half of a pair of white and black lines), vertical resolution, diagonal resolution, radial resolution, and vertical-temporal resolution. But a bigger issue with spatial resolution isn’t what it distinguishes but the act of distinguishing, itself.

As light passes through an aperture (like the iris of a lens), it is diffracted, and the diffraction creates patterns of interference that affect how much contrast there can be at varying levels of detail. The formula for the amount of contrast that gets through is one minus the product of 1.22 times the wavelength of the light in nanometers times the f-number of the aperture times the spatial resolution measured in line pairs per millimeter.

The “one-minus” nature of the formula means the only way for 100% of the contrast to get through is for one of the factors to be zero. The wavelength of light can’t be zero, and the f-number can’t be zero, so 100% contrast is possible only when the fineness of the detail is zero (i.e., the entire screen is black, white, gray, or a single color).

Other factors besides diffraction reduce contrast as detail fineness increases, such as microcracks in lens glass and optical and electronic filtering. The result is something called an MTF (modulation-transfer function) curve plotting contrast from a maximum at minimum resolution to a minimum at maximum resolution.

Another curve plotting contrast versus resolution is a contrast-sensitivity function (CSF). It indicates how much contrast a person needs in order to be able to distinguish a certain resolution, measured by retinal angle, usually in cycles per degree (cpd) or arc minutes (sixtieths of a degree). Unlike an MTF curve, a CSF curve reaches its maximum at around eight cpd, rather than zero, and falls off on either side.

At first glance, other than the CSF curve’s drop off on the left, the two curves may seem to be similar. Unfortunately, the CSF curve’s drop off on the right means that humans need most contrast where the MTF curve’s drop off on the right indicates that equipment just won’t supply it. That’s why distinguishing is such an important issue with regard to resolution.

Consider very fine but low-contrast detail in a picture. The MTF curve will give the detail even less contrast, but the CSF curve will say that humans need even more contrast to be able to see the detail. The finest detail might be there, and the resolution of the system might make it measurable with appropriate equipment, but it simply can’t be seen (see sidebar “One Person’s Noise”).

If resolution can be invisible, is there a different characteristic related to fineness of detail that’s more important as a predictor of image quality? There is. It’s called “sharpness,” a psychological sensation, and it has been shown to be proportional to the square of the area under an MTF curve.

That means it’s a function of both resolution and contrast. It also means that a higher-contrast system of, say, 1024 x 768 could conceivably deliver sharper-looking pictures than a lower-contrast system of 1920 x 1080.

Furthermore, the higher-resolution information, though it might be contributing nothing to perceived sharpness, could be stressing the recording or distribution system, introducing visibly objectionable artifacts. In that case, even at the same contrast 1920 x 1080 could actually be inferior to 1024 x 768.

In addition to all of its image-related meanings, resolution also refers to a self-pledge sometimes made on January 1. Those making a resolution might promise themselves to lose weight, stop smoking, or get more exercise.

Shortly after January 1 comes the annual Consumer Electronics Show (CES) in Las Vegas, featuring the latest high-definition displays using the most advanced technologies — such as plasma, LCD, DLP, and SED — each capable of displaying ultra-fine detail — at least to test equipment. In other words, that’s where videographers can go for their new gear resolutions.

###

One Person’s Noise

It could be useful to consider one form of content protection, watermarking, in understanding resolution. A watermark, in its original sense, is a nominally imperceptible mark on a piece of paper created by something that affects the way the water in the pulp drains during the creation of the paper. In the paper used to print money, a watermark may be a nominally invisible image seen only when the bill is held up to the light.

Similarly, a digital watermark is identifying information added to an image in such a way that it is invisible for normal viewing but able to be decoded to identify a source of non-permitted copying. But how can it be both invisible and decodable?

Different organizations use different forms of watermarking, but consider simply the noise in a picture. Noise (usually manifested as tiny dots in the picture) may be at very high resolution but — normally — at very low contrast, so it’s imperceptible. It’s also normally random. But if a signal could be coded in a pseudo-random fashion and applied to the picture at just above the nominal noise level, it could still be invisible yet carry valuable information.

As a matter of fact, in the early days of HDTV, such noise-level coding was actually considered as a means of transmitting the extra detail of high-definition imagery through standard-definition channels and recorders. A common image test sequence called “calendar and train,” consisting of a number of moving playthings panned by a camera, was once coded in this technique and then recovered.

It was another case of toys and their noise.

###