Cbb752b12

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
 
(40 intermediate revisions not shown)
Line 1: Line 1:
 +
This wiki page is for the Spring 2012 Class. Please go to http://www.gersteinlab.org/courses/452 for the current class.
 +
 +
 +
__TOC__
 +
=CBB 752=
=CBB 752=
Line 26: Line 31:
===Different headings for this class===
===Different headings for this class===
-
<pre>
+
'''MB&B452/MCDB452'''
-
MBB452
+
This version of the course consists of lectures, written problem sets, and a final written project.
-
MB&B 452 01 (21914)
+
'''MB&B752/MCDB752'''
-
MCDB45
+
-
everything + written proj + written homeworks
+
This version of the course consists of lectures, written problem sets, and a final, graduate level written project.
-
MB&B752
+
'''CB&B752/CPSC752'''
-
MCDB752
+
-
everything + grad level written proj + homeworks
+
This version of the course consists of lectures, programming assignments, and a final programming project.
-
CB&B752
 
-
CPSC752
 
-
everything + programming proj
+
For ''graduate students'' the course is broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
-
+ prog assignment
+
-
For graduate students the course is broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement.):
+
'''MB&B 753b3''', Bioinformatics: Practical Application of Data Mining (1st half of term)
-
MB&B 753b3, Bioinformatics: Practical Application of Data Mining (1st half of term)
+
'''MB&B 754b4''', Bioinformatics: Practical Application of Simulation (2nd half of term)
-
MB&B 754b4, Bioinformatics: Practical Application of Simulation (2nd half of term)
+
Each module consists of lectures, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.
-
same as mbb752
 
-
incl the final proj. at half length
 
-
</pre>
+
For the grade weighting schemes of each course version, see Class Requirements section.
-
===Differences Between Class Sections===
+
===Prerequisites===
 +
The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.
-
In general, the graduate level CS/CBB course is significantly different than MBB/MCDB (graduate and undergraduate) in several ways. Although the lectures are the same for each section, the graduate level CPSC/CBB course has additional programming assignments in addition to the work being completed by the MBB students. Homework for the MBB section centers on the completion of several problem sets without a programming component. The CPSC/CBB section forgoes these problem sets and instead requires that students implement several of the algorithms discussed in class. Also, the final project for CPSC/CBB MUST be a programming assignment rather than the final paper required for the MBB section. Due to the distinct course requirements, category weightings for final grades are also different.
+
Students should have:
 +
A basic knowledge of biochemistry and molecular biology.
 +
A knowledge of basic quantitative concepts, such as single variable calculus, some probability and statistics, and basic programming skills.
 +
These can be fulfilled by the following prerequisites statement: "Prerequisites: MBB 200 and Mathematics 115 or permission of the instructor."
==Timing & location==
==Timing & location==
-
Class: Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 9 Jan.)
+
'''Class:''' Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 9 Jan.)
-
Discussion section: TBA
+
'''Discussion section:''' Wed 6-7 pm in Gibbs 263 (starting Jan 18)
==Instructors==
==Instructors==
Line 76: Line 78:
! Name
! Name
! Office
! Office
-
! Office Phone
 
! Email
! Email
|-
|-
| Mark Gerstein
| Mark Gerstein
| Bass 432A
| Bass 432A
-
| (203) 432-6105
 
| mark.gerstein(at)yale.edu
| mark.gerstein(at)yale.edu
|}
|}
-
===Instructors===
+
===Guest Instructors===
{| border="1"
{| border="1"
Line 92: Line 92:
! Office
! Office
! Email
! Email
-
! Office Hours
 
|-
|-
| Corey O'Hern
| Corey O'Hern
| Mason Laboratory
| Mason Laboratory
| corey.ohern(at)yale.edu
| corey.ohern(at)yale.edu
-
| TBA
 
|-
|-
| Jesse Rinehart
| Jesse Rinehart
| 300 George St
| 300 George St
| jesse.rinehart(at)yale.edu
| jesse.rinehart(at)yale.edu
-
| TBA
 
|-
|-
| James Noonan
| James Noonan
| 333 Cedar St
| 333 Cedar St
| james.noonan(at)yale.edu
| james.noonan(at)yale.edu
-
| TBA
 
|-
|-
| Kei Cheung
| Kei Cheung
| 300 George St
| 300 George St
| kei.cheung(at)yale.edu
| kei.cheung(at)yale.edu
-
| TBA
 
|-
|-
| Steven Kleinstein
| Steven Kleinstein
| 300 George St
| 300 George St
| steven.kleinstein(at)yale.edu
| steven.kleinstein(at)yale.edu
-
| TBA
 
|}
|}
Line 126: Line 120:
! Name
! Name
! Office
! Office
-
! Office Phone
 
! Email
! Email
|-
|-
| Lucas Lochovsky
| Lucas Lochovsky
| Bass 437
| Bass 437
-
| (203) 432-5405
 
| lucas.lochovsky(at)yale.edu
| lucas.lochovsky(at)yale.edu
|-
|-
| Jane Leng
| Jane Leng
| Bass 437
| Bass 437
-
| (203) 432-5405
 
| jing.leng(at)yale.edu
| jing.leng(at)yale.edu
 +
|-
 +
| Alice Zhou (select lectures)
 +
| Mason Laboratory
 +
| alice.zhou(at)yale.edu
|}
|}
==Topics/Class Schedule==
==Topics/Class Schedule==
-
[https://docs.google.com/spreadsheet/ccc?key=0Ar2M8lIkl9T5dDN4d1Q0aDBGcThXdjV2c21ESWVxSGc Class Schedule] (including a list of topics and quiz dates)
+
'''[https://docs.google.com/spreadsheet/ccc?key=0Ar2M8lIkl9T5dDN4d1Q0aDBGcThXdjV2c21ESWVxSGc Class Schedule]''' (including a list of topics and quiz dates)
-
==Discussion Sections==
+
==Polls==
 +
 
 +
'''[https://docs.google.com/spreadsheet/viewform?formkey=dFpoWF92YlVQQTFFOWY0cWt6RUdueWc6MQ Poll]''' for students to indicate good times for the weekly discussion section
 +
 
 +
'''[https://docs.google.com/spreadsheet/viewform?formkey=dGprVkplZng1WUxZOXJTdWppN19yUHc6MQ Poll]''' with 3 Qs on difficulty of classes 1 to 4 (to be filled out by 25-Jan.)  [http://archive.gersteinlab.org/docs/2012/02.20/lecture_survey_1to4_response_summary.pdf Result]
 +
 
 +
'''[https://docs.google.com/spreadsheet/viewform?hl=en_US&formkey=dENDNEctVkhkb0tSZ0s1YWRkU2RuMkE6MQ#gid=0 Poll]''' on the first two discussion sections
 +
 
 +
'''[https://docs.google.com/spreadsheet/viewform?formkey=dEN4cWNRVVY5eU02aWFCcHBjd3NlVXc6MQ Poll]''' on classes 5 to 9  [http://archive.gersteinlab.org/docs/2012/02.20/lecture5to9.pdf Result]
 +
 
 +
'''[https://docs.google.com/spreadsheet/viewform?formkey=dFRiZnd2ZEVPb0VfMy05SGxhZFVkX0E6MQ#gid=0 Poll]''' on classes 10 to 17
 +
 
 +
==Discussion Section Readings==
===Session 1===
===Session 1===
Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) [http://www.gersteinlab.org/courses/452/10-spring/pdf/ngs.pdf PDF]
Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) [http://www.gersteinlab.org/courses/452/10-spring/pdf/ngs.pdf PDF]
-
Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (208) [http://www.gersteinlab.org/courses/452/10-spring/pdf/WatsonGenome.pdf PDF]
+
Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (2008) [http://www.gersteinlab.org/courses/452/10-spring/pdf/WatsonGenome.pdf PDF]
===Session 2===
===Session 2===
Line 167: Line 174:
===Session 5===
===Session 5===
-
Laura J. van 't Veer et al. Gene expression profiling predicts clinical outcome of breast cancer Nature 415, 530-536 (31 January 2002) | doi:10.1038/415530a; Received 24 August 2001; Accepted 22 November 2001 [http://www.nature.com/nature/journal/v415/n6871/full/415530a.html TEXT]
+
Sotiriou et al. (2006) Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis. JNCI J Natl Cancer Inst (15 February 2006) 98 (4):262-272.doi: 10.1093/jnci/djj052 [http://jnci.oxfordjournals.org/content/98/4/262.full.pdf+html PDF]
-
Kwang-Il Goh, Michael E. Cusick, David Vall, Barton Child, Marc Vidal, and Albert-La ́szlo ́ Barabasi (2007) The human disease network Proc Natl Acad Sci U S A. 2007 May 22;104(21):8685-90. Epub 2007 May 14. [http://www.pnas.org/content/104/21/8685.full.pdf+html PDF]
+
Ekman D, Light S, Björklund AK, Elofsson A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7(6):R45. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779539/pdf/gb-2006-7-6-r45.pdf PDF]
===Session 6===
===Session 6===
Line 190: Line 197:
===Discussion Section / Readings===
===Discussion Section / Readings===
-
Papers will be assigned throughout the course. These papers will be presented and discussed in weekly sections with the TAs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.
+
Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.
===Bioinformatics quizzes===
===Bioinformatics quizzes===
Line 198: Line 205:
There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.
There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.
-
==Non-CBB Final Project==
+
===Assignment postings===
 +
Homework1 (Problem set for MBB&MCDB, programming assignment for CBB&CPSC) [http://info.gersteinlab.org/images/2/2b/Cbb752_hw1.doc HW1]
-
[tbd]
+
Homework2 [http://archive.gersteinlab.org/docs/2012/04.02/cbb752_hw2_2012.pdf HW2] [http://archive.gersteinlab.org/docs/2012/04.03/positions.dat positions.dat] [http://archive.gersteinlab.org/docs/2012/04.03/dihedral.dat dihedral.dat]
-
==CBB Final Project==
+
===Final Project===
-
[tbd]
+
'''[http://archive.gersteinlab.org/course/cbb752b12/cbb752_finalproject.pdf Project Description]'''
===Grade Categories===
===Grade Categories===
Line 216: Line 224:
|-
|-
| Quizzes
| Quizzes
-
| 33.25%
+
| 33%
|-
|-
| Final Project
| Final Project
-
| 33.25%
+
| 33%
|-
|-
| Discussion Section
| Discussion Section
-
| 8.50%
+
| 9%
|-
|-
| Programming Assignments
| Programming Assignments
-
| 25.00%
+
| 25%
|}
|}
Line 236: Line 244:
|-
|-
| Quizzes
| Quizzes
-
| 33.25%
+
| 33%
|-
|-
| Final Project
| Final Project
-
| 33.25%
+
| 33%
|-
|-
| Discussion Section
| Discussion Section
-
| 16.75%
+
| 17%
|-
|-
| Problem Sets
| Problem Sets
-
| 16.75%
+
| 17%
|}
|}
-
 
-
===Prerequisites===
 
-
The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.
 
-
 
-
Students should have:
 
-
 
-
A basic knowledge of biochemistry and molecular biology.
 
-
A knowledge of basic quantitative concepts, such as single variable calculus, some probability and statistics, and basic programming skills.
 
-
These can be fulfilled by the following prerequisites statement: "Prerequisites: MBB 200 and Mathematics 115 or permission of the instructor."
 
===Relevant Yale College Regulations===
===Relevant Yale College Regulations===
Line 264: Line 263:
http://yalecollege.yale.edu/content/completion-course-work
http://yalecollege.yale.edu/content/completion-course-work
 +
 +
Brief presentation on how to cite correctly : http://archive.gersteinlab.org/mark/out/log/2012/06.12/cbb752b12/cbb752_cite.ppt
==Misc==
==Misc==
Line 276: Line 277:
[http://www.gersteinlab.org/courses/452/10-spring/ 2010],
[http://www.gersteinlab.org/courses/452/10-spring/ 2010],
[http://www.gersteinlab.org/courses/452/10-spring/previous.html 2009 and earlier]
[http://www.gersteinlab.org/courses/452/10-spring/previous.html 2009 and earlier]
 +
([[Pointers on finding things on old class pages]])

Latest revision as of 03:17, 30 September 2012

This wiki page is for the Spring 2012 Class. Please go to http://www.gersteinlab.org/courses/452 for the current class.


Contents


CBB 752

Course Information

Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Concise undergraduate course description

Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration.

See entry from undergraduate catalog: http://students.yale.edu/oci/resultDetail.jsp?course=21914&term=201201 , viz:

MB&B 452 01 (21914) /MCDB452/MB&B752/MB&B753/MB&B754/CB&B752/MCDB752/CPSC752
Bioinformatics: Practical Application of Simulation and Data Mining 
Mark Gerstein
MW 1.00-2.15 BASS 305
Spring 2012 
No regular final examination
Areas Sc
Prerequisites: MB&B 301b and MATH 115a or b, or permission of instructor.
MCDB 120a or 200b is a prerequisite for courses numbered MCDB 202 and above.

Different headings for this class

MB&B452/MCDB452

This version of the course consists of lectures, written problem sets, and a final written project.

MB&B752/MCDB752

This version of the course consists of lectures, written problem sets, and a final, graduate level written project.

CB&B752/CPSC752

This version of the course consists of lectures, programming assignments, and a final programming project.


For graduate students the course is broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):

MB&B 753b3, Bioinformatics: Practical Application of Data Mining (1st half of term)

MB&B 754b4, Bioinformatics: Practical Application of Simulation (2nd half of term)

Each module consists of lectures, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.


For the grade weighting schemes of each course version, see Class Requirements section.

Prerequisites

The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:

A basic knowledge of biochemistry and molecular biology. A knowledge of basic quantitative concepts, such as single variable calculus, some probability and statistics, and basic programming skills. These can be fulfilled by the following prerequisites statement: "Prerequisites: MBB 200 and Mathematics 115 or permission of the instructor."

Timing & location

Class: Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 9 Jan.)

Discussion section: Wed 6-7 pm in Gibbs 263 (starting Jan 18)

Instructors

Instructor-in-Charge

Name Office Email
Mark Gerstein Bass 432A mark.gerstein(at)yale.edu

Guest Instructors

Name Office Email
Corey O'Hern Mason Laboratory corey.ohern(at)yale.edu
Jesse Rinehart 300 George St jesse.rinehart(at)yale.edu
James Noonan 333 Cedar St james.noonan(at)yale.edu
Kei Cheung 300 George St kei.cheung(at)yale.edu
Steven Kleinstein 300 George St steven.kleinstein(at)yale.edu

Teaching Fellows

Name Office Email
Lucas Lochovsky Bass 437 lucas.lochovsky(at)yale.edu
Jane Leng Bass 437 jing.leng(at)yale.edu
Alice Zhou (select lectures) Mason Laboratory alice.zhou(at)yale.edu

Topics/Class Schedule

Class Schedule (including a list of topics and quiz dates)

Polls

Poll for students to indicate good times for the weekly discussion section

Poll with 3 Qs on difficulty of classes 1 to 4 (to be filled out by 25-Jan.) Result

Poll on the first two discussion sections

Poll on classes 5 to 9 Result

Poll on classes 10 to 17

Discussion Section Readings

Session 1

Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) PDF

Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (2008) PDF

Session 2

Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M. (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.Cell. 2006 Nov 3;127(3):635-48. PDF

Nevan J. Krogan et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae Nature 440, 637-643 (30 March 2006) PDF

Session 3

T.F. Smith and M.S. Waterman. (1981) Identification of common molecular subsequences. Journal of Molecular Biology,147(1): 195-7. PMID: 7265238. PDF

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10. PMID: 2231712. PDF

Session 4

Bailey TL, Williams N, Misleh C, Li WW. (2006) MEME: discovering and analyzing DNA and protein sequence motifs, Nucl Acids Res.34:W369-373 PDF

Garnier J, Gibrat JF, Robson B. (1996) GOR method for predicting protein secondary structure from amino acid sequence.Methods in Enzymology,266: 540-53. PMID: 8743705. PDF

Session 5

Sotiriou et al. (2006) Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis. JNCI J Natl Cancer Inst (15 February 2006) 98 (4):262-272.doi: 10.1093/jnci/djj052 PDF

Ekman D, Light S, Björklund AK, Elofsson A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7(6):R45. PDF

Session 6

Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. (2009) The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10(5):R58. Epub 2009 May 29. PDF

Session 7

Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol. 2002 Jan;2(1):28-36. PDF

Session 8

ML Connolly. (1983) Solvent-accessible surfaces of proteins and nucleic acids. Science, 221(4612): 709-13. PMID: 6879170.PDF

Martin Karplus and J. Andrew McCammon. (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology,9, 646-52. PMID: 12198485.PDF

Session 9

Dill KA, Ozkan SB, Shell MS, Weikl TR. (2008) The Protein Folding Problem.Annu Rev Biophys,9, 37:289-316. PMID: 2443096.PDF

Bowman GR, Beauchamp KA, Boxer G, Pande VS. “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131 (2009) 124101 PDF

Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

Bioinformatics quizzes

There will be four short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.

Programming Assignments (CBB and CS)

There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.

Assignment postings

Homework1 (Problem set for MBB&MCDB, programming assignment for CBB&CPSC) HW1

Homework2 HW2 positions.dat dihedral.dat

Final Project

Project Description

Grade Categories

CBB and CPSC Sections:

Category  % of Total Grade
Quizzes 33%
Final Project 33%
Discussion Section 9%
Programming Assignments 25%

MBB and MCDB Sections:

Category  % of Total Grade
Quizzes 33%
Final Project 33%
Discussion Section 17%
Problem Sets 17%

Relevant Yale College Regulations

Students may have questions concerning end-of-term matters. Links to further information about these regulations can be found below:

http://yalecollege.yale.edu/content/reading-period-and-final-examination-period

http://yalecollege.yale.edu/content/completion-course-work

Brief presentation on how to cite correctly : http://archive.gersteinlab.org/mark/out/log/2012/06.12/cbb752b12/cbb752_cite.ppt

Misc

Permissions on using website material

Graphic for course homepage

If you're really motivated, take a look at http://gersteinlab.org/jobs for further Research Opportunities

Pages from previous years

2011, 2010, 2009 and earlier (Pointers on finding things on old class pages)

Personal tools