Computer scientists and biologists at the Department of Energy's Lawrence Berkeley National Laboratory have developed software that can select tens of thousands of high-quality images of biological molecules from electron microgaphs, rapidly and automatically, with accuracy approaching that of experienced human analysts.
The new algorithm, described as "particle picking by segmentation," promises to greatly increase the speed and power of methods for determining biological structures at high resolution, based on data from electron microscopy.
When what's needed is a high-resolution structure of a large and complicated biological molecule -- a ribosome, say, which combines protein and RNA, or a membrane protein that readily falls apart in water and is hard to crystallize -- biologists often turn to cryo-electron microscopy (cryo-EM) to perform single-particle reconstruction.
Understanding structure is often the key to devising antibiotics and other therapies that can interfere with unwanted biological activity -- for example, the ability of infectious bacteria to synthesize proteins can be wrecked by jamming their ribosomes, if the ribosome structure is known in detail. Single-particle reconstruction with cryo-EM holds the promise of providing many high-resolution structures which may be difficult or impossible to obtain otherwise.
Instead of trying to coax molecules to arrange themselves in a repeating crystalline structure, as is necessary for x-ray crystallography, cryo-EM uses individual molecules frozen in random orientations. Capturing two-dimensional images of the molecule from many different angles allows powerful computers to recreate the structure in three dimensions, a process molecular biologist Robert Glaeser of Berkeley Lab's Physical Biosciences and Life Sciences Divisions, who is also a professor of biochemistry and molecular biology at the University of California at Berkeley, calls "crystallization in silico."
"In theory, you need twice as many particles as the molecular weight of what you want to image," explains Umesh Adiga, a member of Glaeser's laboratory and a staff scientist in the Physical Biosciences Division. Molecular weight roughly corresponds to the number of atoms in the molecule. "So for a molecule with half a million atoms, you need a million particle images -- thousands for each orientation."
These must be chosen from many millions of candidates, and each must show the whole particle and nothing but the particle. A typical micrograph may show fifteen hundred or more particles, but picking them out isn't easy. The microscope's electron beam has to be kept at low power to prevent radiation damage, so the signal-to-noise ratio is low and the particles are barely perceptible shapes in a field of gray.
Automatic particle-picking methods have been devised to meet this challenge, but until now even the best yield more than 30 percent false positives -- either poor-quality images of particles or something else altogether, like debris or background noise. Therefore "a human still has to go through them and pick out the good ones," Adiga says.
Adiga and his colleagues decided that concentrating too much attention on the particle itself in the early stages of picking -- for example, approximating its shape and creating a template into which real images are forced to fit, a process common to all previous automatic methods -- simply added to the difficulty. "We decided that if there's noise, there's noise, so at first let's not deal with the particle but with the noise," he says. "If the particle is the foreground, we deal with the background."
By first establishing the average gray-scale range of the particles of interest, contrast can be maintained while the fine texture of the background is smoothed out. The smoothed-out background is then subtracted.
The next steps involve a procedure called segmentation, developed by Adiga and his colleagues. After the background is subtracted, the micrograph is rendered in high contrast. Only shapes of a certain size and brightness are retained; all the rest are thrown away in a step called binarization, or thresholding. "You need not know how the particle looks before you set out to pick good images of it, only how big it is," says Adiga.
The thresholding procedure is iterative, but eventually the processed high-contrast particle images can be matched unambiguously with their originals in the more highly detailed, low-contrast micrograph. Some images may still remain problematic -- for example, some particles may be so close together they appear to be touching; in these cases, an additional procedure called "pinch-off" separates candidates that aren't actually connected and discards those that are. Boxes are drawn around the final picks and their image quality is enhanced by an operation called "shrink-wrapping."
If a portion of an adjacent particle protrudes into the box, it is automatically discarded and replaced with a pattern textured like the rest of the background. At this end stage of the procedure -- although not at the beginning -- it may be advantageous to use templates (which include shape information about the particle) to refine identifications.
Scores of micrographs are needed to supply the hundreds of thousands of particles in a typical large-molecule reconstruction, but a program user needs to set parameters like particle size and gray-scale range only once, on a single micrograph. Thereafter the program runs on its own, sorting through each micrograph in about ten minutes.
Adiga and his colleagues tested the new algorithm by using it to pick images from among over 130,000 ribosome particles in 55 micrographs provided by the Wadsworth Center of the New York State Department of Health in Albany. Adiga separately inspected the 55 micrographs by eye and "manually" selected particles, well over 80 percent of which turned out to be the same as those picked by the program. Fewer than 10 percent of the images chosen by the program were false positives.