From GersteinInfo

Revision as of 01:13, 6 January 2011 by Mbg (Talk | contribs)
Jump to: navigation, search


CBB 752

Course Information

Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Quizzes and Final Project

There will be approximately four short quizzes during the semester and a take-home final project. For CBB and CS sections, the final project will be a programming assignment. For MB&B, the final project will be a paper. Further details will be announced at a later date.

Literature discussion section

One session of 60 minutes per week, time to be arranged. Student presentations of recent research papers relevant to the topics of the course. Led by Pedro Alves (Bass, Rm 437; 432-5405; pedro.alves@yale.edu) and Jia Kang (?; jia.kang@yale.edu).

Programming Projects/Problem Sets

Students taking this course listed under Computational Biology and Bioinformatics or Computer Science will be required to complete several short programming assignments. Further details will be discussed in the literature discussion section and during class.

Grade Categories

CBB and CPSC Sections:

Quizzes - 33% Final Project - 33% Discussion Section - 8.25% Programming Assignments - 24.75%

MBB and MCDB Sections:

Quizzes - 33% Final Project - 33% Discussion Section - 16.5% Problem Sets - 16.5%

Differences Between Class Sections

In general, the graduate level CS/CBB course is significantly different than MBB/MCDB (graduate and undergraduate) in several ways. Although the lectures are the same for each section, the graduate level CPSC/CBB course has additional programming assignments in addition to the work being completed by the MBB students. homework for the MBB section centers on the completion of several problem sets without a programming component. The CPSC/CBB section forgoes these problem sets and instead requires that students implement several of the algorithms discussed in class. Also, the final project for CPSC/CBB MUST be a programming assignment rather than the final paper equired for the MBB section. Due to the distinct course requirements, category weightings for final grades are also different.

Timing & location

Class: Meeting from 1:00-2:15 pm on Monday and Wednesday, in 305 BASS.

Discussion section: TBA



Mark Gerstein

432A BASS, Phone 203 432-6105, e-mail mark.gerstein(at)yale.edu


Corey O'Hern 203, Mason Laboratory e-mail corey.ohern(at)yale.edu Office Hours: M 2:15-3:15 PM

Teaching Fellows

Pedro Alves, Bass Rm 437, 203 432-5405

Jia Kang,?


Class Schedule

Discussion Sections

Session 1

Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) PDF

Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (208) PDF

Session 2


Session 3

T.F. Smith and M.S. Waterman. (1981) Identification of common molecular subsequences. Journal of Molecular Biology,147(1): 195-7. PMID: 7265238. PDF

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10. PMID: 2231712. PDF

Session 4


Garnier J, Gibrat JF, Robson B. (1996) GOR method for predicting protein secondary structure from amino acid sequence.Methods in Enzymology,266: 540-53. PMID: 8743705. PDF

Session 5

Chuang, HY, Lee, E, Liu, YT, Lee, D, and Ideker, T. (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol. 3:140 PMID:17940530 PDF

Session 6

Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. (2009) The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10(5):R58. Epub 2009 May 29. PDF

Session 7

Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol. 2002 Jan;2(1):28-36. PDF

Session 8

ML Connolly. (1983) Solvent-accessible surfaces of proteins and nucleic acids. Science, 221(4612): 709-13. PMID: 6879170.PDF

Martin Karplus and J. Andrew McCammon. (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology,9, 646-52. PMID: 12198485.PDF

Session 9

Dill KA, Ozkan SB, Shell MS, Weikl TR. (2008) The Protein Folding Problem.Annu Rev Biophys,9, 37:289-316. PMID: 2443096.PDF

Bowman GR, Beauchamp KA, Boxer G, Pande VS. “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131 (2009) 124101 PDF

Papers for Dr. O'Hern's Lectures:

J. D. Honeycutt and D. Thirumalai, “The nature of folded states of globular proteins,” Biopolymers 32 (1992) 695 PDF

W. C. Swope and J. W. Pitera, “Describing protein folding kinetics By molecular dynamics simulations. 1. Theory,” J. Phys. Chem. B 108 (2004) 6571 PDF

W. C. Swope, J. W. Pitera, et al., "Describing protein folding kinetics by Molecular Dynamics Simulations. 2. Example applications to Alanine Dipeptide and beta-hairpin peptide," J. Phys. Chem. B 108 (2004) 6582 PDF

D. Bratko, T. Cellmer, J. M. Prausnitz, and H. W. Blanch, “Molecular Simulation of protein aggregation,” Biotechnology and Bioengineering 96 (2007) 1 PDF

Final Project


Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly sections with the TAs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

Bioinformatics quizzes

There will be approximately three short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.

Programming Assignments (CBB and CS)

There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.


The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:

A basic knowledge of biochemistry and molecular biology. A knowledge of basic quantitative concepts, such as single variable calculus, some probability and statistics, and basic programming skills. These can be fulfilled by the following prerequisites statement: "Prerequisites: MBB 200 and Mathematics 115 or permission of the instructor."

Pages from previous years

2010, 2009 and earlier

Research Opportunities

If you're really motivated, take a look at http://bioinfo.mbb.yale.edu/jobs/.

Personal tools