Mbg-research
From GersteinInfo
Research Summary: Protein Bioinformatics
The biological sciences are being transformed by the advent of large-scale data. The sequencing of the human genome is a most dramatic example of this. Simultaneously, with this increase in biological data, computers and computation have had a transforming effect on the way information is handled, stored, and mined. These computational advances, of course, apply to many facets of life. The goal of my lab is to connect these two developments, harnessing computational advances for the analysis of large-scale data, principally by carrying out integrative surveys and systematic data mining.
Specifically, we are focused on protein bioinformatics: understanding the structure, function, and evolution of proteins through analyzing populations of them in the databases and in whole-genome experiments. Overall we have four research foci, summarized below.
1 Genomics: Mining Intergenic Regions, especially in relation to Pseudogenes
We are involved in a number of large-scale collaborations to probe the activity of intergenic regions with tiling array technology. The overall conclusion from this work has been that much of the intergenic regions of the human genome appear to be active, both transcriptionally and in terms of protein binding. In connection with tiling-array experiments, we have done an extensive amount of intergenic annotation, with a particular focus on mining intergenic regions for pseudogenes (protein fossils). Collectively, our studies enable us to determine the common "pseudofamilies" in various genomes and address important evolutionary questions about the proteins that were present in the past history of an organism.
2 Proteomics: Using Networks to Understand Protein Function
After the main elements of the human genome are identified, one needs to characterize their function. We are trying to characterize gene function through molecular networks. We work on systematically integrating many weak functional genomic features with data mining techniques to predict protein networks (comprising protein interactions and other functional linkages). In addition, we have studied the structure of protein networks, both on a large-scale in terms of global statistics (e.g. the diameter) and on a small-scale in terms of local network motifs (e.g. hubs).
3 Structural Genomics: Analysis of Folds, Families and Functions on a Large-scale
Another area of research in our lab is structural genomics. Here, we conceptualize proteins not purely as character sequences or abstract network nodes, but more in terms of their molecular structure. We have examined the large-scale relationships between sequence, structure and function in order to understand the extent to which structural and functional annotation can reliably be transferred between similar sequences, particularly when similarity is expressed in modern probabilistic language. We have also related the occurrence of protein folds and families to phylogeny and deep evolutionary history.
4 Computational Biophysics: Relating Motions & Packing
The final area of focus in the lab is analyzing small populations of structures in terms of their detailed 3D-geometry and physical properties. Here, we try to interpret macromolecular motions in terms of packing. We have set up a database of macromolecular motions and coupled it with simulation tools to interpolate between structural conformations; the database also has tools to predict likely motions based on simple models, such as normal modes and localized hinges connecting rigid domains.
References
[1] Relating three-dimensional structures to protein networks provides evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006) Science 314: 1938-41
[2] The real life of pseudogenes. M Gerstein, D Zheng (2006) Sci Am 295: 48-55.
[3] Genomic analysis of regulatory network dynamics reveals large topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004) Nature 431: 308-12.
[4] Genomics. Defining genes in the genomics era. M Snyder, M Gerstein (2003) Science 300: 258-60.
[5] Simulating water and the molecules of life.M Gerstein, M Levitt (1998) Sci Am 279: 100-5.
