We do research in biomedical data science, applying computational approaches to problems in molecular biology and genetics. We are interested in large-scale analyses of genome sequences and macromolecular structures. We also work on analyzing images and large-scale text and bio-sensor data. Our research involves several quantitative techniques, including database design, systematic data mining and deep learning, visualization of high-dimensional data, and molecular simulation. We specifically focus on annotating the human genome sequence, especially in characterizing the vast intergenic regions and interpreting disease-associated variants. Doing this at scale requires tackling issues of genomic privacy (to enable data sharing) and better representing the disease phenotypes associated with the variants. Next, we are trying to get at the function of all the genes encoded by the genome using molecular networks. Finally, for the group of protein-coding genes with known 3D structures, we are trying to see how their function is carried out through motion.

