Cbb752b14

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Created page with ''''Bioinformatics: Practical Application of Simulation and Data Mining''' Course will next be offered in Spring 2014. Below is website for course in Fall 2012. ---- __TOC__ =C…')
(Assignment postings)
 
(115 intermediate revisions not shown)
Line 1: Line 1:
-
'''Bioinformatics: Practical Application of Simulation and Data Mining'''
+
=Bioinformatics: Practical Application of Data Mining & Simulation=
 +
17th iteration at Yale,  with material [http://info.gersteinlab.org/Cbb752b14#Pages_from_previous_years from all previous years] available! ([http://GersteinLab.org/courses/452 GersteinLab.org/courses/452])
-
Course will next be offered in Spring 2014. Below is website for course in Fall 2012.
+
=News=
-
----
+
In class poll on 3 March: which of these lectures did you like most:
-
__TOC__
+
[INTRODUCTION]
 +
[ALIGNMENT]
 +
[UNSUPERVISED MINING]
 +
[SUPERVISED MINING]
 +
[NETWORK TOPOLOGY]
 +
[FUNSEQ APPLICATION]
 +
[NETWORK PREDICTION]
-
=CBB 752=
+
Quiz 2 is on Wednesday, 26 Feb, and will cover all of the material up through Monday, 24 Feb.
-
==UPDATES==
+
Quiz 1 is on Wednesday, 12 Feb, and will cover all of the material up through slide 31 of lecture 7 (3 Feb).
-
-'''3 Jan 2013''' <br/>
+
-
Answer keys to Quizzes cbb752a12: found [http://info.gersteinlab.org/Cbb752a12#Bioinformatics_quizzes here] <br/>
+
-
Answer keys to Quizzes cbb752b12: found [http://info.gersteinlab.org/Cbb752a12#Pages_from_previous_years here]
+
-
- '''5 Dec. 2012''' <br/>
+
Discussion sections start this week (week of 27 Jan)!  Both sections will be held in Bass 405 (directly above our lecture classroom).  One will be Wed 2:30-3:30 pm, and the other will be Fri from 4:30-5:30pm. See [http://info.gersteinlab.org/Cbb752b14#Session_1:_Next_Gen_Sequencing_.28Experimental.29 readings].  Please write a 1-2 paragraph summary of each paper, to be turned in before section.
-
Assignment 3 posted.
+
-
- '''10 Nov. 2012''' <br/>
+
If you are still not receiving class emails, please contact Michael at michael.rutenbergschoenberg (at) yale.edu.
-
Assignment 2 posted.  
+
-
- '''10 Oct 2012''' <br/>
+
=Schedule=
-
The instructor requested a swap for assigned readings in genomics and privacy (session 7). So the previous 2 have been swapped out for one paper.
+
'''[https://docs.google.com/spreadsheet/ccc?key=0Av2J5_8MyluFdF90LTg3S0RsSXU0YjdHVEx1WFljZHc&usp=sharing Class Schedule]''' (including a list of topics and quiz dates)
 +
<HR>
 +
__TOC__
 +
=Course Information=
-
Please take note and would appreciate people volunteering for its presentation on 8th Nov.
+
==Course Description==
-
 
+
-
- '''9 Oct 2012''' <br/>
+
-
[quiz 2]
+
-
Please be reminded of the quiz tomorrow. You can skip the lecture on “Mining your own personal genome” and focus your attention on the other lecture slides, those after quiz 1.
+
-
 
+
-
[assignment 1]
+
-
Please be reminded also that your first assignment is due tomorrow 1159pm.
+
-
 
+
-
[Jonathan Rothberg’s talk and written homework due next section]
+
-
The results are out! With the exception of a small number of you who did not vote, 100% of the people who voted voted for the JR talk. So please make your way to Luce Hall auditorium this Thursday from 4-5pm. http://www.yale.edu/mcdb/seminars/index.html
+
-
Note also that you are to do a writeup on the talk in place of the 2 papers this week and hand that in the next section.
+
-
 
+
-
[Section]
+
-
Please note that the next section is on '''1st Nov 2012'''.
+
-
 
+
-
Also, take note that there is a swap in the papers to be presented in the upcoming section on 1st Nov: Connolly M paper on solvent accessible areas and KArplus and McCammon on molecular simulations by Michael RS and Nathan P. Please TAKE NOTE of this switch and prepare your presentations.
+
-
 
+
-
At the same time, we would like to appeal to people who haven’t presented to volunteer for the 4 papers that have not been taken up.
+
-
 
+
-
-'''5 Oct 2012''' <br/>
+
-
1) Input and output files for testing your assignment 1 SW algorithm have been put up on the class wiki. Please note that this is just a test case separate from the assignment, for you to test your algorithm, THIS IS NOT THE ANSWER TO THE ASSIGNMENT. The output file also contains a sample of how you can present your scoring matrix, best local alignment(s) and their scores. Note that this is not the only way to present your output.
+
-
http://info.gersteinlab.org/Cbb752a12#Assignment_postings
+
-
 
+
-
-'''1 Oct 2012''' <br />
+
-
1) Ruijie S (Tim) will be presenting the ontology paper this Thursday.<br/>
+
-
 
+
-
-'''30 Sept 2012''' <br />
+
-
1) Michael RS will be presenting the paper on surface area by Connolly<br />
+
-
 
+
-
-'''26 Sept 2012''' <br />
+
-
1) Assignment 1 has been posted. It is due on 10th October 2012 11.59pm.
+
-
 
+
-
http://info.gersteinlab.org/Cbb752a12#Assignment_postings
+
-
 
+
-
2) An audio record for class today can found here:  http://info.gersteinlab.org/Cbb752a12#Audio_recordings_for_selected_lectures.
+
-
 
+
-
 
+
-
-'''23 Sept 2012''' <br />
+
-
[QUIZ 1]
+
-
The first quiz will be on Monday (tomorrow 24 Sept 2012) and it will happen at the start of class, at 1pm, so please try to be on time for class. It will be a 20-minute quiz. Lecture will follow the quiz.
+
-
 
+
-
[ASSIGNMENT 1]
+
-
The first assignment will also be posted this week on Wed. Please keep a lookout for it on Wed. There will be a separate programming and non-programming assignment for the 2 groups. You only need to do one of 2 depending on your course code, not both. More details will be on class wiki on Wed.
+
-
 
+
-
 
+
-
-'''20 Sept 2012''' <br />
+
-
[Past Quizzes]
+
-
I have put up last semester’s quizzes on class wiki under the category below, and you can also look at previous quizzes under “2009 and earlier”. The format is pretty much the same.
+
-
http://info.gersteinlab.org/Cbb752a12#Pages_from_previous_years.
+
-
 
+
-
-'''18 Sept 2012''' <br />
+
-
[Poll results]
+
-
Poll results are on class wiki.
+
-
 
+
-
[Section]
+
-
The section is confirmed to be on Thursdays 4-5pm in BASS room 405 (same room as before). These are the section dates:
+
-
20th Sept, 27th Sept, 4th Oct, 11th Oct, 1st Nov, 8th Nov, 15th Nov, 29th Nov (edited)
+
-
 
+
-
So this Thursday will be the next section 4-5pm BASS 405.
+
-
 
+
-
[Quiz]
+
-
Quiz 1 is next week, 24th Sept (Monday). Please note that all materials (including discussion papers) up till and including tomorrow’s lecture will be covered. It will be a 20-minute quiz. You can refer to quizzes from previous years for practice.
+
-
 
+
-
[Cluster account]
+
-
We have signed 12 people up for a cluster account on Bulldog L, these people have included their netIDs in the poll, although only 9 are officially enrolled in cbb/cpsc course code. You should have received an email from the systems admin.
+
-
 
+
-
 
+
-
-'''16 Sept 2012''' <br />
+
-
[LECTURE]
+
-
Note that there will be NO LECTURE tomorrow (Monday 17th SEPT 2012). The next lecture is on Wednesday 19th SEPT 2012.
+
-
 
+
-
[SECTION]
+
-
According to the new poll results, these people on the mailing list will be your fellow course mates and the section timing is still every Thursday 4-5pm. So the next section is this Thursday 20th SEPT 2012. I will post the venue soon.
+
-
 
+
-
 
+
-
-'''14 Sept 2012''' <br />
+
-
[POLL2]
+
-
A reminder: PLEASE fill out Poll2 ASAP, in order for Yao and I to expedite the admin process (e.g. reserve the room and get cluster accounts for the people doing the computational assignments and final project) and most importantly to let you guys know when and where the next section will be. So we would greatly appreciate if you guys fill this out by end of TODAY.
+
-
 
+
-
EVERYBODY intending to take/audit this course is to fill out this poll. Subsequent mailing list and discussion timing will depend ENTIRELY on this new poll.
+
-
 
+
-
 
+
-
[PRESENTATIONS]
+
-
We have marked the names of those who have volunteered for respective papers on the class wiki, right next to the paper itself. So let us know if you intend to take a stab at any of the other available papers.
+
-
 
+
-
1. Kyle M. will be presenting the yeast PPI paper (Ekman et. al.) <br />
+
-
2. Nathan P. will be presenting the biomolecular MD paper (Karplus & McCammon).
+
-
 
+
-
-'''13 Sept 2012''' <br />
+
-
1. A huge reminder: PLEASE DO THE POLL 2 BY TOMORROW!! thanks.<br />
+
-
2. Shantao L. is presenting the yeast protein landscape paper.<br />
+
-
 
+
-
-'''12 Sept 2012''' <br />
+
-
1. Poll 2 is up. Please fill it out by Friday (14 Sept 2012).<br />
+
-
2. Next section is 20 SEPT 2012 (NOT tomorrow).<br />
+
-
 
+
-
-'''08 Sept 2012''' <br />
+
-
1. Sebastian K. will be presenting breast cancer gene expression profile paper.<br />
+
-
2. Please note that the next write-up is only due on 20th Sept 2012.
+
-
 
+
-
-'''06 Sept 2012''' <br />
+
-
1. Added papers for section session 6. <br />
+
-
2. Please note that the NEXT SECTION is 20th Sept 2012. <br />
+
-
3. Rob A. is presenting the NGS paper (Metzker et. al.). <br />
+
-
4. Jimi M. is presenting the MEME paper. <br />
+
-
5. Paul B. is presenting the Watson genome paper (Wheeler et. al.) for the 4th section. <br />
+
-
 
+
-
-'''01 Sept 2012''' <br />
+
-
1. http://info.gersteinlab.org/Cbb752a12#Timing_.26_location <br />
+
-
2. http://info.gersteinlab.org/Cbb752a12#Programming_Assignments_.28CBB_and_CS.29_and_Programming_issues <br />
+
-
3. http://info.gersteinlab.org/Cbb752a12#Instructors <br />
+
-
4. Nicole T. is presenting the BLAST paper for the first section. <br />
+
-
5. Steven B. is presenting the SW paper for the first section. A huge thank you to the 2 volunteers for the quick response!
+
-
 
+
-
==Course Information==
+
-
 
+
-
===Course Description===
+
Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.
Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.
-
===Concise undergraduate course description===
+
==Concise undergraduate course description==
Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration.
Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration.
See entry from undergraduate catalog:
See entry from undergraduate catalog:
-
http://students.yale.edu/oci/resultDetail.jsp?course=11937&term=201203 , viz:
+
http://students.yale.edu/oci/resultDetail.jsp?course=23441&term=201401, viz:
-
  MB&B 452 01 (11937) /MCDB452/MB&B752/MB&B753/MB&B754/CB&B752/MCDB752/CPSC752
+
  MB&B 452 01 (23441) /MCDB452/CB&B752/MCDB752/CPSC752/MB&B452
-
  Bioinformatics: Practical Application of Simulation and Data Mining  
+
  Bioinformatics:Mining&Simulatn
  Mark Gerstein
  Mark Gerstein
-
  MW 1.00-2.15 BASS 305
+
  MW 1.00-2.15
-
  Fall 2012
+
BASS 305
 +
 
 +
  Fall 2014
  No regular final examination
  No regular final examination
  Areas Sc
  Areas Sc
Line 159: Line 49:
  MCDB 120a or 200b is a prerequisite for courses numbered MCDB 202 and above.
  MCDB 120a or 200b is a prerequisite for courses numbered MCDB 202 and above.
-
===Different headings for this class===
+
==Different headings for this class==
'''MB&B452/MCDB452'''
'''MB&B452/MCDB452'''
Line 174: Line 64:
-
For ''graduate students'' the course is broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
+
For ''graduate students'' the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
'''MB&B 753a3''', Bioinformatics: Practical Application of Data Mining (1st half of term)
'''MB&B 753a3''', Bioinformatics: Practical Application of Data Mining (1st half of term)
Line 185: Line 75:
For the grade weighting schemes of each course version, see Class Requirements section.
For the grade weighting schemes of each course version, see Class Requirements section.
-
===Prerequisites===
+
==Prerequisites==
The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.
The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.
Line 195: Line 85:
-
==Timing & location==
+
=Timing & location=
-
'''Class:''' Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 29 Aug 2012 (Wed) and the next meeting will be 31 Aug 2012 (Fri).)
+
'''Class:''' Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 13 Jan 2014 (Mon).  The third meeting will be 17 Jan 2014 (Fri), as part of Yale's compensation for canceling classes on 20 Jan 2014 (Mon.), in observance of MLK day.  See '''[https://docs.google.com/spreadsheet/ccc?key=0Av2J5_8MyluFdF90LTg3S0RsSXU0YjdHVEx1WFljZHc&usp=sharing Course Schedule]''' for details.)
'''Discussion section:'''  
'''Discussion section:'''  
-
Every Thursday 4-5pm
 
-
First section: 6th Sept 2012 (Thurs) @ Bass 405 4-5pm
 
-
Each section will include discussion of papers assigned (below) and each paper will be presented by a student. Each presentation should be approx. 10 min. Powerpoint slides are optional. Please note the presentation is separate from the write-up, i.e. you still need to do the write-up.
+
Section 1 (Michael): Wednesdays 2:30-3:30pm, starting 29 Jan 2013.
 +
Section 2 (Cong): Fridays 4:30-5:30pm, starting week of 31 Jan 2013.
-
==Instructors==
+
=Instructors=
-
Consultation is available UPON REQUEST or according to times stipulated by the individual instructors.
+
-
===Instructor-in-Charge===
+
Consultation is available UPON REQUEST or according to times stipulated by the individual instructors. Email '''cbb752(at)gersteinlab.org''' to reach the instructor and the TFs .
 +
 
 +
==Instructor-in-Charge==
{| border="1"
{| border="1"
Line 216: Line 106:
! Email
! Email
|-
|-
-
| Mark Gerstein
+
| [http://www.gersteinlab.org/about Mark Gerstein]
-
| Bass 432A
+
| [http://info.gersteinlab.org/Lab_Address Bass 432A]
-
| mark.gerstein(at)yale.edu
+
| [http://contact.gerstein.info mark.gerstein *at* yale.edu]
|}
|}
-
===Guest Instructors===
+
==Guest Instructors==
{| border="1"
{| border="1"
Line 249: Line 139:
| steven.kleinstein(at)yale.edu
| steven.kleinstein(at)yale.edu
|-
|-
-
| Dov Greenbaum
 
-
| --
 
-
| dov.greenbaum(at)aya.yale.edu
 
|}
|}
-
===Teaching Fellows===
+
==Teaching Fellows==
{| border="1"
{| border="1"
Line 262: Line 149:
! Email
! Email
|-
|-
-
| Jieming Chen
+
| Michael Rutenberg Schoenberg
-
| Bass 437, Bass 323
+
-
| jieming.chen(at)yale.edu
+
-
|-
+
-
| Yao Fu
+
| Bass 437
| Bass 437
-
| yao.fu(at)yale.edu
+
| michael.rutenbergschoenberg(at)yale.edu
|-
|-
-
| Wendell Smith (selected lectures)
+
| Cong Li
-
| Mason Laboratory 313
+
| 300 George, Suite 503
-
| wendell.smith(at)yale.edu
+
| cong.li(at)yale.edu
|}
|}
-
==Topics/Class Schedule==
+
=Discussion Section=
-
'''[https://docs.google.com/spreadsheet/ccc?key=0AkioDDFQaIjNdHJ1T0lHUnRibzl4R2V0MkItQmR3Tmc#gid=0 Class Schedule]''' (including a list of topics and quiz dates)
+
Section 1 (Michael): Wednesdays 3-4pm, starting 29 Jan 2013.
-
==Polls==
+
Section 2 (Cong): TBD, starting week of 27 Jan 2013.  Email Cong at cong.li (at) yale.edu if you want to attend this section to help him with scheduling.
-
'''[https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dFFkN0ZVWnhoUDlVZ2lRX0tqdjZSR0E6MQ Poll]''' for students' sign up and good times for the weekly discussion section   '''[http://archive.gersteinlab.org/cbb752a/poll1-cbb752-sign-up.pdf Result]'''
+
Each section will include discussion of papers assigned (below). Students are expected to submit 1-2 paragraph summaries of each paper before the section. In Section 1 (Wed 3-4pm), students will give 15-20 min presentations of the papers.  The second section will likely be much smaller, and will have a discussion format.  The written assignment will be the same, and students will be graded on a combination of the written assignments and your participation in discussions.
-
'''[https://docs.google.com/spreadsheet/viewform?formkey=dFk0amhnbGZhbGUxZTRRWkY2c2tjQVE6MQ#gid=0 Poll2] '''[http://archive.gersteinlab.org/cbb752a/poll2results.pdf Result2]''' <br/>
+
=Discussion Section Readings=
-
'''[https://docs.google.com/spreadsheet/viewform?formkey=dHk3RFJXVHQzZEd4MWNudFhkZ2hwQ3c6MQ#gid=0 "Section or JR's talk" Poll] '''[http://archive.gersteinlab.org/cbb752a/talkpoll.jpg Results]''' <br/>
+
-
'''[https://docs.google.com/spreadsheet/viewform?formkey=dE1HcmpTZ0pWUmI4S0dBMVNmTTZQMWc6MQ#gid=0 Poll3]'''
+
-
'''[https://docs.google.com/spreadsheet/viewform?formkey=dFhXOU40ZkpuOFdxNm9HNFIzZ3h6cnc6MQ#gid=0 Make-up lecture from Prof. O'Hern]'''
+
==Session 1: Next Gen Sequencing (Experimental) ==
-
==Discussion Section Readings==
+
Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) [http://www.gersteinlab.org/courses/452/10-spring/pdf/ngs.pdf PDF]
-
===Session 1===
+
Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (2008) [http://www.gersteinlab.org/courses/452/10-spring/pdf/WatsonGenome.pdf PDF]
-
T.F. Smith and M.S. Waterman. (1981) Identification of common molecular subsequences. Journal of Molecular Biology,147(1): 195-7. PMID: 7265238. [http://www.gersteinlab.org/courses/452/10-spring/pdf/sw.pdf PDF] '''[Nicole T.]'''
+
-
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10. PMID: 2231712. [http://www.gersteinlab.org/courses/452/10-spring/pdf/Altschul.pdf PDF] '''[Steven B.]'''
+
==Session 2: Proteomics/Sequence Alignment ==
-
===Session 2===
 
-
Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) [http://www.gersteinlab.org/courses/452/10-spring/pdf/ngs.pdf PDF] '''[Rob A.]'''
 
-
Bailey TL, Williams N, Misleh C, Li WW. (2006) MEME: discovering and analyzing DNA and protein sequence motifs, Nucl Acids Res.34:W369-373 [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538909/?tool=pubmed PDF] '''[Jimi M.]'''
 
-
===Session 3===
+
T.F. Smith and M.S. Waterman. (1981) Identification of common molecular subsequences. Journal of Molecular Biology,147(1): 195-7. PMID: 7265238. [http://www.gersteinlab.org/courses/452/10-spring/pdf/sw.pdf PDF]  
-
Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M. (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.Cell. 2006 Nov 3;127(3):635-48. [http://www.pil.sdu.dk/1/MSQuant/Cell_GlobalPhosphorylationSignalingDynamics.pdf PDF] '''[Qian W.]'''
+
-
Nevan J. Krogan et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae Nature 440, 637-643 (30 March 2006) [http://www.nature.com/nature/journal/v440/n7084/pdf/nature04670.pdf PDF] '''[Shantao L.]'''
+
Nevan J. Krogan et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae Nature 440, 637-643 (30 March 2006) [http://www.nature.com/nature/journal/v440/n7084/pdf/nature04670.pdf PDF]  
-
===Session 4===
+
[http://archive.gersteinlab.org/proj/cbb752b14/Rinehart_suggested_reading_2014.docx Additional readings suggested by Professor Rinehart]
-
Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (2008) [http://www.gersteinlab.org/courses/452/10-spring/pdf/WatsonGenome.pdf PDF] '''[Paul B. ]'''
+
-
Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. (2009) The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10(5):R58. Epub 2009 May 29. [http://genomebiology.com/content/10/5/R58 PDF] '''[Ruijie S (Tim)]'''
+
==Session 3: Sequence Alignment/Machine learning==
 +
 +
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10. PMID: 2231712. [http://www.gersteinlab.org/courses/452/10-spring/pdf/Altschul.pdf PDF]
-
===Session 5===
+
Yip, KY, Cheng, C, Gerstein, M (2013). Machine learning and genome annotation: a match meant to be?. Genome Biol., 14, 5:205. [http://archive.gersteinlab.org/proj/cbb752b14/Yip_Machine_Learning_2013.pdf PDF]
-
Sotiriou et al. (2006) Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis. JNCI J Natl Cancer Inst (15 February 2006) 98 (4):262-272.doi: 10.1093/jnci/djj052 [http://jnci.oxfordjournals.org/content/98/4/262.full.pdf+html PDF] '''[Sebastian K.] ''' 
+
-
Ekman D, Light S, Björklund AK, Elofsson A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7(6):R45. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779539/pdf/gb-2006-7-6-r45.pdf PDF] '''[Kyle M.]'''
+
== Session 4: Bioinformatics for Next-Gen Sequencing ==
-
===Session 6===   
+
Rozowsky, J, Euskirchen, G, Auerbach, RK, Zhang, ZD, Gibson, T, Bjornson, R, Carriero, N, Snyder, M, Gerstein, MB (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol., 27, 1:66-75 [http://archive.gersteinlab.org/papers/e-print/PeakSeq/preprint.pdf PDF]
-
'''November 1'''
+
-
Martin Karplus and J. Andrew McCammon. (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology,9, 646-52. PMID: 12198485.[http://www.gersteinlab.org/courses/452/10-spring/pdf/Karplus.pdf PDF] '''[Nathan P]'''
+
Cooper, GM, Shendure, J (2011). Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet., 12, 9:628-40 [http://www.nature.com/nrg/journal/v12/n9/pdf/nrg3046.pdf PDF]
-
ML Connolly. (1983) Solvent-accessible surfaces of proteins and nucleic acids. Science, 221(4612): 709-13. PMID: 6879170.[http://www.gersteinlab.org/courses/452/10-spring/pdf/Connolly.pdf PDF] '''[Michael RS]'''
+
== Session 5: Bioinformatics for Next-Gen Sequencing 2==
-
===Session 7===
+
Lior Pachter. Models for Transcript Quantifications from RNA-Seq (2011) ArXiV [http://arxiv.org/pdf/1104.3889v2 PDF]
-
'''November 8'''
+
-
Presidential Commission for the study of Bioethical Issues - Privacy and Progress in Whole Genome Sequencing. October 2012.
+
==Session 6: Networks ==
-
You only need to read the Executive Summary and the Introduction (~35 pages total) for the write-up and presentation [http://archive.gersteinlab.org/cbb752a/dov-Privacy-and-Progress_PCSBI.pdf PDF]
+
-
===Session 8===
+
Ekman D, Light S, Björklund AK, Elofsson A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7(6):R45. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779539/pdf/gb-2006-7-6-r45.pdf PDF]
-
'''November 15'''
+
-
Dill KA, Ozkan SB, Shell MS, Weikl TR. (2008) The Protein Folding Problem.Annu Rev Biophys,9, 37:289-316. PMID: 2443096.[http://www.gersteinlab.org/courses/452/10-spring/pdf/proteinFolding.pdf PDF]
+
Barabási, AL, Oltvai, ZN (2004). Network biology: understanding the cell's functional organization. Nat. Rev. Genet., 5, 2:101-13. [http://www.nature.com/nrg/journal/v5/n2/pdf/nrg1272.pdf PDF]
-
Bowman GR, Beauchamp KA, Boxer G, Pande VS. “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131 (2009) 124101 [http://www.gersteinlab.org/courses/452/10-spring/pdf/bowman.pdf PDF]
+
==Session 7: Immunological Modeling/Semantic Web==
-
===Session 9===
+
Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol. 2002 Jan;2(1):28-36. [http://www.gersteinlab.org/courses/452/10-spring/pdf/perelson.pdf PDF]
-
'''November 29'''
+
-
Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol. 2002 Jan;2(1):28-36. [http://www.gersteinlab.org/courses/452/10-spring/pdf/perelson.pdf PDF] '''[Caroline B.]'''
+
Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. (2009) The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10(5):R58. Epub 2009 May 29. [http://genomebiology.com/content/10/5/R58 PDF]
-
==Class Requirements==
+
==Session 8: Protein Simulation 1==
 +
 
 +
Martin Karplus and J. Andrew McCammon. (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology,9, 646-52. PMID: 12198485.[http://www.gersteinlab.org/courses/452/10-spring/pdf/Karplus.pdf PDF]
 +
 
 +
Zhou, AQ, O'Hern, CS, Regan, L (2011). Revisiting the Ramachandran plot from a new angle. Protein Sci., 20, 7:1166-71 [http://jamming.research.yale.edu/files/papers/rama.pdf PDF]
 +
 
 +
==Session 9: Protein Simulation 2==
 +
 
 +
Dill KA, Ozkan SB, Shell MS, Weikl TR. (2008) The Protein Folding Problem.Annu Rev Biophys,9, 37:289-316. PMID: 2443096.[http://www.gersteinlab.org/courses/452/10-spring/pdf/proteinFolding.pdf PDF]
 +
 
 +
Bowman GR, Beauchamp KA, Boxer G, Pande VS. “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131 (2009) 124101 [http://www.gersteinlab.org/courses/452/10-spring/pdf/bowman.pdf PDF]
 +
 
 +
=Class Requirements=
===Discussion Section / Readings===
===Discussion Section / Readings===
Line 348: Line 231:
===Bioinformatics quizzes===
===Bioinformatics quizzes===
There will be four short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.
There will be four short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.
-
 
Answer keys to Quizzes 1-4 cbb752a12: found [http://archive.gersteinlab.org/cbb752a/cbb752a12_quizzes_anskeys.zip here]
Answer keys to Quizzes 1-4 cbb752a12: found [http://archive.gersteinlab.org/cbb752a/cbb752a12_quizzes_anskeys.zip here]
Line 361: Line 243:
===Assignment postings===
===Assignment postings===
-
'''Assignment 1''': [http://archive.gersteinlab.org/cbb752a/assignment1/Cbb752-HW1.pdf PDF] DUE DATE: '''10th Oct 2012 11.59PM''' <br />
+
[http://archive.gersteinlab.org/proj/cbb752b14/CBB752b14_assignment1.pdf Assignment 1] '''DUE: 3 March 2014'''
-
Files for programming assignment: Download [http://archive.gersteinlab.org/cbb752a/assignment1/cbb752a12_assign1.zip here] <br/>
+
<br>Files for programming assignment: Download [http://archive.gersteinlab.org/proj/cbb752b14/cbb752b14_assign1.zip here] <br/>
-
Test input and output files: Download [http://archive.gersteinlab.org/cbb752a/assignment1/test-sample/sample-test.zip here]
+
Test input and output files: Download [http://archive.gersteinlab.org/proj/cbb752b14/sample-input-updated.tgz here]
-
'''Assignment 2''': All files are [http://archive.gersteinlab.org/cbb752a/Assignment2/ here.] DUE DATE: '''28th Nov. 2012 11.59PM'''
+
[http://archive.gersteinlab.org/proj/cbb752b14/Homework_Kleinstein.2014.doc Assignment 2] ' '''DUE: 7 April 2014'''
-
'''Assignment 3''': All files are [http://archive.gersteinlab.org/cbb752a/Homework_Kleinstein.F2012.pdf] DUE DATE: '''12th Dec. 2012 5.00PM'''
+
[http://archive.gersteinlab.org/proj/cbb752b14/hw3_2014_20April2014.pdf Assignment 3 (updated with reference containing values for model constants)] '''DUE: 24 April 2014'''
 +
 
 +
[http://archive.gersteinlab.org/proj/cbb752b14/config.dat supplementary file for programming assignment]
 +
 
 +
[http://archive.gersteinlab.org/proj/cbb752b14/hw3_ohern_non_programming.pdf Assignment 3 (non-programming)]
===Final Project===
===Final Project===
-
MBB/MCDB : [http://archive.gersteinlab.org/cbb752a/finalproj/cbb752a12_galaxy_rnaseq_files.zip Files for Galaxy]
 
-
[http://archive.gersteinlab.org/cbb752a/finalproj/CBB752FinalProject.pdf ''' Final Project ''']
+
[http://archive.gersteinlab.org/proj/cbb752b14/CBB752b14_Final_Project_140409.pdf Final Project "Final version" - updated 4 April 2014 (filepaths of project materials corrected) ]  '''DUE: 30 Apr 2014 11.59pm'''
 +
 
 +
[http://archive.gersteinlab.org/proj/cbb752b14/cbb752b14_galaxy_rnaseq_files.zip Files for MBB/MCDB pseudocomputational section]
 +
 
 +
[http://archive.gersteinlab.org/proj/cbb752b14/Accessing_and_using_BulldogJ.pdf Accessing and Using BulldogJ]
===Grade Categories===
===Grade Categories===
Line 446: Line 335:
Also, it might be of interest to people, to look at [http://www.yaledailynews.com/news/2012/sep/11/blurring-cheating-collaboration/ this recent article regarding academic dishonesty].
Also, it might be of interest to people, to look at [http://www.yaledailynews.com/news/2012/sep/11/blurring-cheating-collaboration/ this recent article regarding academic dishonesty].
-
==Misc==
+
=Misc=
[http://info.gersteinlab.org/Permissions Permissions] on using website material
[http://info.gersteinlab.org/Permissions Permissions] on using website material
Line 453: Line 342:
If you're really motivated, take a look at http://gersteinlab.org/jobs for further Research Opportunities
If you're really motivated, take a look at http://gersteinlab.org/jobs for further Research Opportunities
-
==Audio recordings for selected lectures==
+
==Polls==
-
26th Sept 2012 Lecture 8 - Proteomics Part II by Prof J. Rinehart [http://archive.gersteinlab.org/cbb752a/recordings/120926-rinehart-cbb752a12_lecture.mp3 MP3] <br />
+
-
1st Oct 2012 Lecture 9 - Semantic Web & Databases by Prof KH Cheung [http://archive.gersteinlab.org/cbb752a/recordings/121001-kei-cbb752a12_lecture.mp3 MP3] <br />
+
'''[https://docs.google.com/forms/d/1bFtfOpCLaA7aEMzWXHgVVPtk53I22YjoYNFRefXVm_4/viewform Poll]''' for students' sign up and good times for the weekly discussion section 
-
==Pages from previous years==
+
'''[https://docs.google.com/forms/d/1A9ZWecS-YQ64BZpfsd72lPP3oz6wGD7VfOuiCmGgkgA/viewform Poll 2]''' Section sign-up
 +
=Pages from previous years=
 +
 
 +
2014 is the 17th time Bioinformatics has been taught at Yale. Pages for the 16 previous iterations of the class are available. Look at how things evolve!
 +
 
 +
[http://info.gersteinlab.org/Cbb752a12 2012 fall],
[http://info.gersteinlab.org/Cbb752b12 2012 spring]
[http://info.gersteinlab.org/Cbb752b12 2012 spring]
([http://archive.gersteinlab.org/cbb752a/b12quizzes.zip quizzes]),
([http://archive.gersteinlab.org/cbb752a/b12quizzes.zip quizzes]),
[http://info.gersteinlab.org/Cbb752b11 2011],
[http://info.gersteinlab.org/Cbb752b11 2011],
[http://www.gersteinlab.org/courses/452/10-spring/ 2010],
[http://www.gersteinlab.org/courses/452/10-spring/ 2010],
-
[http://www.gersteinlab.org/courses/452/10-spring/previous.html 2009 and earlier]
+
[http://www.gersteinlab.org/courses/452/10-spring/previous.html 2009 and earlier (12 years of classes, staring in '98)]
 +
 
([[Pointers on finding things on old class pages]])
([[Pointers on finding things on old class pages]])

Latest revision as of 00:32, 23 April 2014

Bioinformatics: Practical Application of Data Mining & Simulation

17th iteration at Yale, with material from all previous years available! (GersteinLab.org/courses/452)

News

In class poll on 3 March: which of these lectures did you like most: [INTRODUCTION] [ALIGNMENT] [UNSUPERVISED MINING] [SUPERVISED MINING] [NETWORK TOPOLOGY] [FUNSEQ APPLICATION] [NETWORK PREDICTION]

Quiz 2 is on Wednesday, 26 Feb, and will cover all of the material up through Monday, 24 Feb.

Quiz 1 is on Wednesday, 12 Feb, and will cover all of the material up through slide 31 of lecture 7 (3 Feb).

Discussion sections start this week (week of 27 Jan)! Both sections will be held in Bass 405 (directly above our lecture classroom). One will be Wed 2:30-3:30 pm, and the other will be Fri from 4:30-5:30pm. See readings. Please write a 1-2 paragraph summary of each paper, to be turned in before section.

If you are still not receiving class emails, please contact Michael at michael.rutenbergschoenberg (at) yale.edu.

Schedule

Class Schedule (including a list of topics and quiz dates)


Contents

Course Information

Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Concise undergraduate course description

Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration.

See entry from undergraduate catalog: http://students.yale.edu/oci/resultDetail.jsp?course=23441&term=201401, viz:

MB&B 452 01 (23441) /MCDB452/CB&B752/MCDB752/CPSC752/MB&B452 
Bioinformatics:Mining&Simulatn
Mark Gerstein
MW 1.00-2.15
BASS 305
Fall 2014
No regular final examination
Areas Sc
Prerequisites: MB&B 301b and MATH 115a or b, or permission of instructor.
MCDB 120a or 200b is a prerequisite for courses numbered MCDB 202 and above.

Different headings for this class

MB&B452/MCDB452

This version of the course consists of lectures, written problem sets, and a final (semi-computational section and a literature survey) project.

MB&B752/MCDB752

This version of the course consists of lectures, written problem sets, and a final (semi-computational section and a literature survey) project.

CB&B752/CPSC752

This version of the course consists of lectures, programming assignments, and a final programming project.


For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):

MB&B 753a3, Bioinformatics: Practical Application of Data Mining (1st half of term)

MB&B 754a4, Bioinformatics: Practical Application of Simulation (2nd half of term)

Each module consists of lectures, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.


For the grade weighting schemes of each course version, see Class Requirements section.

Prerequisites

The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:

A basic knowledge of biochemistry and molecular biology. A knowledge of basic quantitative concepts, such as single variable calculus, some probability and statistics, and basic programming skills. These can be fulfilled by the following prerequisites statement: "Prerequisites: MBB 200 and Mathematics 115 or permission of the instructor."


Timing & location

Class: Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. (First meeting will be on 13 Jan 2014 (Mon). The third meeting will be 17 Jan 2014 (Fri), as part of Yale's compensation for canceling classes on 20 Jan 2014 (Mon.), in observance of MLK day. See Course Schedule for details.)

Discussion section:

Section 1 (Michael): Wednesdays 2:30-3:30pm, starting 29 Jan 2013. Section 2 (Cong): Fridays 4:30-5:30pm, starting week of 31 Jan 2013.

Instructors

Consultation is available UPON REQUEST or according to times stipulated by the individual instructors. Email cbb752(at)gersteinlab.org to reach the instructor and the TFs .

Instructor-in-Charge

Name Office Email
Mark Gerstein Bass 432A mark.gerstein *at* yale.edu

Guest Instructors

Name Office Email
Corey O'Hern Mason Laboratory corey.ohern(at)yale.edu
Jesse Rinehart 300 George St jesse.rinehart(at)yale.edu
James Noonan 333 Cedar St james.noonan(at)yale.edu
Kei Cheung 300 George St kei.cheung(at)yale.edu
Steven Kleinstein 300 George St steven.kleinstein(at)yale.edu

Teaching Fellows

Name Office Email
Michael Rutenberg Schoenberg Bass 437 michael.rutenbergschoenberg(at)yale.edu
Cong Li 300 George, Suite 503 cong.li(at)yale.edu

Discussion Section

Section 1 (Michael): Wednesdays 3-4pm, starting 29 Jan 2013.

Section 2 (Cong): TBD, starting week of 27 Jan 2013. Email Cong at cong.li (at) yale.edu if you want to attend this section to help him with scheduling.

Each section will include discussion of papers assigned (below). Students are expected to submit 1-2 paragraph summaries of each paper before the section. In Section 1 (Wed 3-4pm), students will give 15-20 min presentations of the papers. The second section will likely be much smaller, and will have a discussion format. The written assignment will be the same, and students will be graded on a combination of the written assignments and your participation in discussions.

Discussion Section Readings

Session 1: Next Gen Sequencing (Experimental)

Metzker ML. "Sequencing technologies - the next generation” Nature Reviews Genetics. 11 (2010) PDF

Wheeler DA et al. "The complete genome of an individual by massively parallel DNA sequencing,” Nature. 452:872-876 (2008) PDF

Session 2: Proteomics/Sequence Alignment

T.F. Smith and M.S. Waterman. (1981) Identification of common molecular subsequences. Journal of Molecular Biology,147(1): 195-7. PMID: 7265238. PDF

Nevan J. Krogan et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae Nature 440, 637-643 (30 March 2006) PDF

Additional readings suggested by Professor Rinehart

Session 3: Sequence Alignment/Machine learning

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10. PMID: 2231712. PDF

Yip, KY, Cheng, C, Gerstein, M (2013). Machine learning and genome annotation: a match meant to be?. Genome Biol., 14, 5:205. PDF

Session 4: Bioinformatics for Next-Gen Sequencing

Rozowsky, J, Euskirchen, G, Auerbach, RK, Zhang, ZD, Gibson, T, Bjornson, R, Carriero, N, Snyder, M, Gerstein, MB (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol., 27, 1:66-75 PDF

Cooper, GM, Shendure, J (2011). Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet., 12, 9:628-40 PDF

Session 5: Bioinformatics for Next-Gen Sequencing 2

Lior Pachter. Models for Transcript Quantifications from RNA-Seq (2011) ArXiV PDF

Session 6: Networks

Ekman D, Light S, Björklund AK, Elofsson A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7(6):R45. PDF

Barabási, AL, Oltvai, ZN (2004). Network biology: understanding the cell's functional organization. Nat. Rev. Genet., 5, 2:101-13. PDF

Session 7: Immunological Modeling/Semantic Web

Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol. 2002 Jan;2(1):28-36. PDF

Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. (2009) The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol. 2009;10(5):R58. Epub 2009 May 29. PDF

Session 8: Protein Simulation 1

Martin Karplus and J. Andrew McCammon. (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology,9, 646-52. PMID: 12198485.PDF

Zhou, AQ, O'Hern, CS, Regan, L (2011). Revisiting the Ramachandran plot from a new angle. Protein Sci., 20, 7:1166-71 PDF

Session 9: Protein Simulation 2

Dill KA, Ozkan SB, Shell MS, Weikl TR. (2008) The Protein Folding Problem.Annu Rev Biophys,9, 37:289-316. PMID: 2443096.PDF

Bowman GR, Beauchamp KA, Boxer G, Pande VS. “Progress and challenges in the automated construction of Markov state models for full protein systems,” J. Chem. Phys. 131 (2009) 124101 PDF

Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

Bioinformatics quizzes

There will be four short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.

Answer keys to Quizzes 1-4 cbb752a12: found here

Programming Assignments (CBB and CS) and Programming issues

There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.

These are the programming languages that we permit in the programming assignments and final project: Perl, Python, C, C++, MATLAB and R. If you really feel more comfortable with other languages, please email the TFs to discuss. Also, packages such as BioPerl and BioPython are not allowed in the assignments and final project. If in doubt, please consult the TFs.

We recommend the use of PERL for most of the programming. A useful resource is the following book: Programming Perl, 3rd Edition in the O' Reilly series, by Larry Wall, Tom Christiansen, Jon Orwant. The Yale Library has also older editions, which would work too. We would also recommend the following online resources: http://www.perlmonks.org/ and http://stackoverflow.com/. Otherwise, Google is your best friend.

Assignment postings

Assignment 1 DUE: 3 March 2014
Files for programming assignment: Download here
Test input and output files: Download here

Assignment 2 ' DUE: 7 April 2014

Assignment 3 (updated with reference containing values for model constants) DUE: 24 April 2014

supplementary file for programming assignment

Assignment 3 (non-programming)

Final Project

Final Project "Final version" - updated 4 April 2014 (filepaths of project materials corrected) DUE: 30 Apr 2014 11.59pm

Files for MBB/MCDB pseudocomputational section

Accessing and Using BulldogJ

Grade Categories


The following are the approximate grading systems:

CBB and CPSC Sections:

Category  % of Total Grade
Quizzes 33%
Final Project 33%
Discussion Section 9%
Programming Assignments 25%

MBB and MCDB Sections:

Category  % of Total Grade
Quizzes 33%
Final Project 33%
Discussion Section 17%
Problem Sets 17%

Relevant Yale College Regulations

Students may have questions concerning end-of-term matters. Links to further information about these regulations can be found below:

http://yalecollege.yale.edu/content/reading-period-and-final-examination-period

http://yalecollege.yale.edu/content/completion-course-work

Brief presentation on how to cite correctly : http://archive.gersteinlab.org/mark/out/log/2012/06.12/cbb752b12/cbb752_cite.ppt

Plagiarism

Below is a message from Dean Mary Miller of Yale College about citing your references and sources of information and plagiarism:

" You need to cite all sources used for papers, including drafts of papers, and repeat the reference each time you use the source in your written work. You need to place quotation marks around any cited or cut-and-pasted materials, IN ADDITION TO footnoting or otherwise marking the source. If you do not quote directly – that is, if you paraphrase – you still need to mark your source each time you use borrowed material. Otherwise you have plagiarized. It is also advisable that you list all sources consulted for the draft or paper in the closing materials, such as a bibliography or roster of sources consulted.
You may not submit the same paper, or substantially the same paper, in more than one course. If topics for two courses coincide, you need written permission from both instructors before either combining work on two papers or revising an earlier paper for submission to a new course.

It is the policy of Yale College that all cases of academic dishonesty be reported to the chair of the Executive Committee.... "

Also, it might be of interest to people, to look at this recent article regarding academic dishonesty.

Misc

Permissions on using website material

Graphic for course homepage

If you're really motivated, take a look at http://gersteinlab.org/jobs for further Research Opportunities

Polls

Poll for students' sign up and good times for the weekly discussion section

Poll 2 Section sign-up

Pages from previous years

2014 is the 17th time Bioinformatics has been taught at Yale. Pages for the 16 previous iterations of the class are available. Look at how things evolve!

2012 fall, 2012 spring (quizzes), 2011, 2010, 2009 and earlier (12 years of classes, staring in '98)

(Pointers on finding things on old class pages)

Personal tools