- | The number of sequenced personal genomes is expected to increase exponentially over the next few years. Soon, sequencing one’s own genome may become as routine and commonplace in medicine as X-rays. Moreover, an individual’s window into biological science will increasingly be viewed through the lens of his or her own genome. In light of these trends, the thrust of my laboratory is aimed at integrating personal genomes with other biological data, as well as developing tools and methods to assist in their interpretation. These endeavors are carried out on a number of frontiers, as outlined below.
| |
- | First, we work extensively on searching for those variants in personal genomes that differ between individuals. In particular, we focus on structural variation, a type of variant which results from re-arrangements of blocks within the genome. It is believed that structural variants involve as many nucleotides in the genome as the better-known single-nucleotide polymorphisms, or SNPs (Mills et al., 2011; Korbel et al., 2008). We have developed a number of approaches for identifying structural variants in genomes. These include evaluating the consistency of the read coverage over the genome (read depth), searching for special reads that split breakpoints (split reads), and analyzing unusual pair separations in paired-end reads (Abyzov et al., 2011a,b; Korbel et al., 2009; Lam et al. 2010). Much of this work has been performed as part of our participation in large international consortia, such as The 1000 Genomes Project, as well as disease-focused programs such as those with a focus on prostate cancer.
| |
- | Once all the variants of a personal genome are identified, we work to understand their consequences and implications. This is generally the objective of genome annotation, which provides biochemical and evolutionary context for each base. Thus, we are very active participants in the international genome annotation efforts carried out by the ENCODE Consortium. We focus on annotating a number of genomic elements, principally transcription-factor binding sites, non-coding RNAs, and pseudogenes.
| |
- | Along these lines, we have developed numerous methods for identifying pseudogenes (Zhang et al. 2006). We consider pseudogenes to be genomic fossils that provide a rich window into human molecular history; human pseudogenes provide much more detail than protein-coding genes, particularly when they are compared to pseudogenes in other organisms (Gerstein & Zheng, 2006). We were one of the first groups to perform comprehensive surveys of pseudogenes on a genome-wide scale in terms of protein families, thus illustrating the very different pseudogene complements in different organisms (Zhang et al., 2002a,b, 2003, 2004; Harrison et al., 2001, 2002a,c, 2003a,b; Zhang & Gerstein, 2003c,e; Liu et al., 2004a; Lam et al., 2008; Pseudogene.org). Moreover, we have uncovered hints that some pseudogenes, which are supposedly "dead", may actually confer biochemical functionality (Zheng et al., 2005, 2007a,b; Harrison et al., 2005, Pei et al., 2012; Sasidharan & Gerstein, 2008).
| |