Structural variation goes extreme

From person to person, the human genome varies in a number of important ways. Some of the variation is in the form of genetic misspellings – single nucleotide polymorphisms, or SNPs. Other variation takes the form of so-called “structural variation:” as genetic rearrangements, or as missing or extra...

From person to person, the human genome varies in a number of important ways. Some of the variation is in the form of genetic misspellings – single nucleotide polymorphisms, or SNPs. Other variation takes the form of so-called “structural variation:” as genetic rearrangements, or as missing or extra segments of DNA, known as copy number variation (CNV). Scientists at the Broad Institute and elsewhere are working to locate and characterize many different types of variation and look for connections between the variants and human traits and disease.

Until recently, researchers had only been able to measure modest copy number variation, such as deletions or small-scale duplication of segments of DNA that result in zero to four copies of some genes. Though they didn’t have the tools to analyze it in the past, it was clear to scientists that there were regions in the genome harboring high structural complexity — “extreme” copy number variation.

Speaking to a roomful of genetics researchers at the Broad Institute two years ago, scientist Steve McCarroll, director of genetics for the Broad’s Stanley Center for Psychiatric Research, explained the difficulty — and importance — of studying extreme structural variation. “Our goal is to develop molecular and statistical methods to analyze structurally complex loci at a population level,” said McCarroll.

McCarroll described an effort by his lab to use whole genome sequencing data to identify a region of the genome that varied widely in copy number. “This study is a proof of concept that over the next few years, we can start making this possible,” McCarroll said. “We can take these regions from the dark corners of the genome to describing the set of structures in the population, how they formed from one another, and the mutational history of the locus. Then we can begin to write the playbook for bringing this data into studies of complex disease.”

McCarroll made these comments at the Broad’s Program in Medical and Population Genetics (MPG) primer lecture, a series of informal weekly discussions covering topics relating to human population genetics and disease. He explained that structural variation leaves several footprints in whole genome sequencing data that can be uncovered computationally. Bob Handsaker, a research scientist in McCarroll’s lab, had developed a set of analytical tools, known as Genome STRiP, for capturing these footprints and identifying structural variants of all sizes. McCarroll said these tools, along with the availability of more whole genome sequence data, would one day allow for rigorous, well-controlled association analysis at genome scale.

“There’s still not a lot of whole genome sequence data in the field, but this is preparing for a future when we’ll be drenched in these kinds of datasets,” he said in 2013. “It will make it possible to utilize structural variation of all shapes and sizes to learn about the genetic basis of disease risk.”

Two years later, McCarroll, Handsaker, and colleagues are sharing some of the first fruits of that effort. Speaking at a MPG primer lecture last month, Handsaker described the team’s recent work, published in Nature Genetics, uncovering extreme forms of copy number variation by analyzing whole-genome sequence data from thousands of genomes at once. The team identified in the human genome hundreds of sites of extreme CNV, at which genes varies in copy number from zero to 15 in different individuals.

As whole-genome sequencing becomes ever cheaper, researchers will be able to apply this approach to many more datasets and glean insights into a broad spectrum of structural variation.

The researchers have made their software toolkit — Genome STRiP 2.0 — and data resource available online to the research community. Handsaker said, “We’re hoping this will unlock a lot more opportunities to find associations between structural variation and phenotype.”

Don’t miss out on another great resource available to both the research community and the general public. You can watch videos of any primer talk to hear experts from across the Broad community give in-depth introductions to the basic principles of complex trait genetics, including human genetic variation, genotyping, DNA sequencing methods, statistics, data analysis, and more. Visit the video library from the MPG primers for more

Related stories:

February 4, 2011. Large-scale sequencing study reveals common variation in DNA structure.

February 16, 2011. Power in numbers.

Handsaker RE, Korn JM, Nemesh J, McCarroll SA. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genetics 43: 269-76, 2011.

July 11, 2012. Unpacking a complex genetic suitcase.

Boettger L.M., Handsaker R.E., Zody M.C., McCarroll SA. Structural haplotypes and recent evolution of the human 17q21.31 locus. Nature Genetics. 44: 881-5, 2012.

January 31, 2015. Variety Show.

Handsaker, R. et al. Large multiallelic copy number variations in humans. Nature Genetics. January 26, 2015.

For more on methods of measuring and analyzing structural variation. Le Scouarnec S, Gribble SM. Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics. Heredity. 2012.