Cs545-07
From GersteinInfo
This page contains general information for the class:
CPSC445/CPSC545/MBB334/MBB545/CBB545
Introduction to Data Mining
Contents |
Course websites
- Main website
- Course wiki
- Yale classes server
Homework
- HW2: Decision trees.
- http://www.gersteinlab.org/courses/545/07-spr/hw/hw2.pdf
- Due Feb 20th, 2007.
- HW3: Multilinear regression.
- HW4: SVM.
Final Project
- Suggested Term Projects. Due May 7th, 2007. Project description due by March 27, 2007
- List of proposed projects
Slides
Week 1
- Introduction to Bioinformatics
Week 3
- Decision tree
- Ensemble methods
Week 4
- Multilinear and logistic regression
Week 5
- SVM
- Perceptron Models
- Links
Week 6
- Molecular Networks as Application of Mining
- Unsupervised Learning and Clustering
- Nice interactive k-means demo
Week 7
- Predicting Networks Through Bayesian Inference
Week 8
- Applications of Spectral methods (PCA/SVD)
Week 9, 10
Spring break
Week 11
- Determining Method of Action in Drug Discovery Using Affymetrix Microarray Data
- Dr. Max Kuhn, Pfizer Research Lab
- http://www.gersteinlab.org/courses/545/07-spr/slides/StaphYale.pdf
- http://www.gersteinlab.org/courses/545/07-spr/slides/caretYale.pdf
- An Introduction to Text Mining with an Application to the Life Sciences.
- Professor Michael Krauthammer, Yale Medical School
- http://www.gersteinlab.org/courses/545/07-spr/slides/text_mining.mk.ppt
Week 12
- A(n) (extremely) brief/crude introduction to minimum description length (MDL) principle
- Kernel PCA
- Edo Liberty
Readings
Week 1
Intro. to Data Mining, Overview of Data Mining in Bioinformatics
- About Data Mining
- Data Mining Applications
- Data Mining In Depth: Description is Not Prediction
Week 2
Datamining workflow, Data Preprocessing and cleaning, Intro. to R and Rattle
- Datamining workflow and presprocessing
- Rattle
- R
- Primary web page for R.
- Guide
Week 3
Intro. to Classification, Decision Trees
- Classification: Basic Concepts, Decision Trees and Model Evaluation
Week 4
Multilinear and Logistic Regression, Support Vector Machines (SVM)
The following chapters are available online courtesy of the authors/publishers for your personal use in this course. You may print a personal copy but you are prohibited from redistribution.
- Chapter 6 of Jiawei Han and Micheline Kamber (2005). Data Mining Concepts and Techniques. Morgan Kaufmann. 2nd Ed.
- Chapter 5 of Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to Data Mining. Addison Wesley.
rpart
- rpart R package vignette
- An Introduction to Recursive Partitioning Using the RPART Routines. Terry M. Therneau and Elizabeth J. Atkinson.
Logistic regression
- MIT Sloan Lecture on Logistic Regression
Week 6
- Modern trends in data mining
Slides for courses based on reference text books
- Ian Witten and Eibe Frank