All You Can See
The equipment exhibitions at the annual convention of the National Association of Broadcasters (NAB) often seem to have themes. Two years ago, it was stereoscopic 3D. Before that, it was DSLRs. Long before HDTV became common, it was a theme at NAB conventions. And there was at least one convention at which the theme seemed to be teletext. At the 2012 NAB show, a theme seemed to be 4K.
What is 4K? That’s a good question without a simple answer. Nominally, 4K denotes a moving-image system with 4096 active (image-carrying) picture elements (pixels) per row. At one time, it was considered to have 2048 active rows; now 2160 — twice HDTV’s 1080 — is more common. But, if twice HDTV is appropriate vertically, why not horizontally, too? Sure enough, some call 3840 pixels across the screen 4K (others call it Quad HD, because twice the number horizontally and vertically results in four times the number of pixels of 1080-line HDTV).
Then there is color. There have been 4K cameras using a beam-splitting prism (right, diagram by Colin M. L. Burnett, http://en.wikipedia.org/wiki/File:Dichroic-prism.png) and three image-sensor chips, just like a typical studio or truck camera. Other 4K cameras have single chips overlayed with color filters (one version, the Bayer pattern, is shown below). There have also been four-chip cameras, with HD-resolution chips and an additional green one offset diagonally by half a pixel. Conceivably, as was done in HD cameras, a 4K camera could also use three HD-resolution chips with the green offset from the red and blue.
Some say a color-filtered chip with at least 4096 (or 3840) photosites per row is 4K; others say it is not. Consider optical low-pass filtering. In a three-chip camera, the optical low-pass can be designed to match any of the chips. In a filtered single-chip (left, also from Burnett, http://en.wikipedia.org/wiki/File:Bayer_pattern_on_sensor.svg) or four-chip camera, should it be optimized for the individual photosites (the “luma” or uncolored resolution), the green ones (which occur more frequently), or the other colors (which have filters spaced twice as far apart as the photosites)?
Then there are those who think it’s not necessary to go all the way to 4K (e.g., the “3.5K” of the popular ARRI Alexa at right) and those who think 4K is insufficient (e.g., proponents of “8K”). Just counting photosites, there have been “4K” cameras with anything from roughly 8.3 to roughly 38.2 million, and there have been other beyond-HDTV-resolution cameras shown and discussed with as few as 3.3 million and as many as 100 million. There’s even a group working on camera systems with a thousand times more pixels than even that high end (100 gigapixels http://www.disp.duke.edu/projects/AWARE/index.ptml).
There are also ways of increasing resolution without changing the number of photosites on an image sensor. One is compressive sampling (described by Siegfried Foessel of Germany’s Fraunhofer Institut at the HPA Tech Retreat in February in a system that increases resolution by covering portions of sensor photosites). There are also various forms of “super-resolution” (one version, which can take advantage of aliases that slip through filters, is shown below, original at left, enhanced at right, in a portion of an image from the Almalence PhotoAcute Studio web site: http://photoacute.com/studio/examples/mac_hdd/index.html).
As I noted in a previous post (“Y4K?” http://www.schubincafe.com/2011/08/31/y4k/), there are benefits to using a beyond-HD-resolution camera even if the distribution will be only HD. These include the possibilities of reframing in post, image stabilization without loss of resolution, one form of stereoscopic 3D shooting, and the delivery of images with perceptually increased sharpness. They’re not just theoretical benefits. Zaxel, for example, announced on July 1 the delivery of their 720CUT, a system that allows a 720p high-definition window to be smoothly moved around a 4K moving image in real time.
Although such issues as cost and storage might still keep users away from higher-resolution cameras, they clearly seem like a good idea. But what about delivering more resolution (not just more sharpness) to the viewer? How many pixels are enough?
Unfortunately, there’s no simple answer. Look again at the pictures above. They could clearly benefit from more detail — even the one on the right. But what if the whole picture were of something the size of a building. In that case, when zooming in so close (the pictures show the label of a hard drive), even a 100-gigapixel image might be insufficient. One benefit of delivering 4K to a home viewer, therefore, is the ability to zoom in to any desired HD frame from the larger 4K frame, as shown in the inner rectangle in the example at left, with a trimmed original image from HighDefWallpapers.Info (http://www.highdefwallpapers.info/amazing-sea-resort-high-definition-wallpapers/ Added 2015 June 26: That link no longer seems to work. Here’s a link to an HD version of the image: http://www.coolwallpapers.org/photo/42925/amazing_sea_resort_high_definition_wallpapers.jpg). Systems for doing such extraction at home have been shown at NAB conventions for years.
How about complete images? Again, there’s no simple answer. At right is a diagram from ARRI’s “4K+ Systems Theory Basics for Motion Picture Imaging” (http://www.efilm.com/publish/2008/05/19/4K%20plus.pdf). Based on 20/20 (or 6/6) vision, it shows visual-acuity limitations for movie viewers in different seats. Even at the rear of this auditorium, a viewer with 20/20 vision could perceive more than 50% more detail than 1080-line HD can deliver in any direction. In the front of the main section of seating, such a viewer could perceive 8K resolution, and, in the very front row, far more than even that extraordinary resolution.
There are, however, some problems with the above. For one thing, almost no one has 20/20 vision. The extra lines at the bottom of an eye chart (left) below the red line indicate that many people have visual acuity far better than 20/20. But the seven lines above the 20/20 line indicate that other people have poorer visual acuity.
Then there is the number 20; 20/20 means that the viewer can see at 20 feet what the “standard” viewer (one with 20/20 vision) can also see at 20 feet (in 6/6, the numbers are in meters). But why specify 20 feet? It’s because at that distance eye-lens focus plays almost no role, and aging viewers can have trouble with eye-lens focus.
In a cinema auditorium, that’s not much of an issue; the screen is likely to be at least 20 feet away. At home-TV viewing distances, it is an issue. So is lighting. Movies are viewed in dark rooms; TV is often viewed with the light on. A simple formula for contrast can be the division of the sum of desired light plus undesired light divided by the undesired light. Movie screens are typically much dimmer than TV screens, but cinema auditoriums are typically very much darker than TV-viewing rooms, so movies typically offer more contrast.
The image above is called a contrast-resolution grating. Contrast increases from bottom to top; detail resolution increases from left to right. You probably see undifferentiated gray at the bottom left and right corners, but both between those corners and above them, you can probably make out vertical lines. The reason you can make out the lines between the corners is that the human visual system has a contrast-sensitivity function with a peak. So perception of resolution depends on contrast. And that’s not all.
If there is an ideal resolution for viewing, it is based on a compromise: Too much, and the system becomes overly expensive; too little, and, aside from any possibility that the viewer might find the pictures insufficiently detailed, the structure of the display becomes visible, theoretically preventing the viewer from seeing the image due to its visible pixels — in effect, not being able to see the forest for the trees. At left and right above are two different pixel structures of two different display panels. Do they offer equivalent structure visibility for the same resolution?
Suppose everyone’s visual acuity is 20/20, and eye-lens-focus (accommodation), contrast, color, and pixel structure don’t matter. Then, with 20/20 defined as 30 cycles per degree, and assuming a white pixel and a black pixel constitute a cycle, as shown at right, it’s possible to use high-school trigonometry to calculate optimum viewing distances. For U.S. standard-definition television, which has about 480 active rows of pixels, that distance would be 7.15 times the height of the picture 7.15H); for 1080-line HDTV, it would be 3.16H; for 2160-line 4K 1.54H; for 4320-line 8K 0.69H. With a lot of rounding (of the same sort that allows 7680-across to be called 8K), these have been called 7, 3, 1.5, and 0.75 times the picture height.
The “9 feet” in the image above happens to be the result of the calculation for an old 25-inch 4×3-shaped TV set, but it has another significance. It is the Lechner Distance. Named for then-RCA Laboratories researcher Bernard Lechner, it is the result of a survey conducted to see how far people sit from their TV screens. Richard Jackson, a researcher at Philips Laboratories in Redhill, England, conducted his own survey and came up with a similar 3 meters. The distance is determined by room sizes and furniture. It is not affected by screen sizes or resolutions, although flat-panel TV sets, lacking the depth required by a long-necked picture tube, would, in theory at least, increase the distance somewhat.
At right is a portion of Figure 3 of the paper “‘Super Hi-Vision’ Video Parameters for Next-Generation Television,” by Takayuki Yamashita, Kenichiro Masaoka, Kohei Ohmura, Masaki Emoto, Yukihiro Nishida, and Masayuki Sugawara of the NHK Science and Technology Research Laboratories. It shows that a viewer’s “sense of being there” increases as the viewing distance decreases, as might be expected; as the screen occupies more of the visual field, the viewer gets enveloped in the image. It also shows that “sense of realness” increases with greater viewing distance. That’s also as might be expected; from the top of a skyscraper, a viewer can’t tell the difference between a mannequin (fake) and a person (real) at street level.
Super Hi-Vision is being shown to the public at the 2012 Olympic Games in special, giant-screen viewing rooms, as has been the case when it was exhibited at such broadcast exhibitions as NAB and the International Broadcasting Convention. Viewers can see HD detail from just the segment of screen in front of them and glance elsewhere to see more HD-equivalent images forming the whole. I wrote previously of a system Canon has demonstrated with even more resolution (http://www.schubincafe.com/2010/09/07/whats-next/). In those special viewing venues, it’s easy to achieve a viewing distance of 0.75H; at home, at the Lechner distance, it would require a TV image 12-feet high.
At the same London Games, however, the official host broadcaster is using the DVCPROHD codec, which reduces 1920-pixel-across 1080-line HDTV resolution by a substantial amount. HDCAM does something similar. Both have been acceptable because they retain most of the image sharpness, even though they greatly reduce its resolution, because they preserve most of the area under the modulation-transfer-function curve shown at right.
Perhaps it would be better to say that DVCPROHD and HDCAM have been acceptable. Today, some viewers seem willing to comment on the difference between the reduced resolution of those systems and “full HD.” That might be because some forms of perception are learned.
After Thomas Edison switched from phonograph cylinders to disks, he came up with a plan to demonstrate their quality. He presented a series of “tone tests.” In small venues, as shown at left, listeners would be blindfolded. At larger ones, the lights would go out. In either case, the audience had to decide whether they’d heard the live singer or a pre-electronic phonograph disk.
These comments from a Pittsburgh Post reporter in 1919 were typical: “It did not seem difficult to determine in the dark when the singer sang and when she did not. The writer himself was pretty sure about it until the lights were turned on again and it was discovered that [the singer] was not on the stage at all and that the new Edison alone had been heard.” Today, we scoff at the idea that audiences couldn’t hear differences between those forms of sounds, but we’ve had years of high fidelity to let us know what sounds bad.
As with hearing, so, too, with vision. At right is the apparatus used in an old experiment conducted to see whether animals would cross a visual gap. When the gap was covered with a visually transparent material, they would not. When the transparent material was covered with visible stripes, they would. But animals raised from birth in an environment devoid of lines oriented in a particular direction treated stripes oriented that way on the transparent material as though they weren’t there and wouldn’t cross.
So, can viewers actually avail themselves of beyond-HD resolution at home? If they’d simply sit closer to their screens, the answer would be a definite yes. If they continue to sit at the Lechner Distance, the answer is less obvious. On April 28, reporting on an 8K 145-inch television screen, PC World used the headline “Panasonic’s Newest TV Prototype Is Too Big for Your Living Room” <http://www.pcworld.com/article/254649/panasonics_newest_tv_prototype_is_too_big_for_your_living_room.html>.
Possibilities? Maybe we’ll sit closer. Maybe we’ll learn to see with greater acuity (NHK’s Super Hi-Vision research showed subjects already able to perceive differences in “realness” in detail more than five times finer than the 20/20 criterion). Maybe we’ll use virtual viewing systems unrestricted by rooms and furniture. Or maybe not.
Meanwhile, a little skepticism probably couldn’t hurt. Things aren’t always as they seem.
In a 1972 interview, Anna Case (left), one of the opera singers used in the Edison tone tests, admitted that she’d trained herself to sound like a phonograph recording. Oh, well.