This page contains general information for the class:
Introduction to Data Mining
- Main website
- Course wiki
- Yale classes server
- HW2: Decision trees.
- Due Feb 20th, 2007.
- HW3: Multilinear regression.
- HW4: SVM.
- Suggested Term Projects. Due May 7th, 2007. Project description due by March 27, 2007
- List of proposed projects
- Introduction to Bioinformatics
- Decision tree
- Ensemble methods
- Multilinear and logistic regression
- Perceptron Models
- Molecular Networks as Application of Mining
- Unsupervised Learning and Clustering
- Nice interactive k-means demo
- Predicting Networks Through Bayesian Inference
- Applications of Spectral methods (PCA/SVD)
Week 9, 10
- Determining Method of Action in Drug Discovery Using Affymetrix Microarray Data
- Dr. Max Kuhn, Pfizer Research Lab
- An Introduction to Text Mining with an Application to the Life Sciences.
- Professor Michael Krauthammer, Yale Medical School
- A(n) (extremely) brief/crude introduction to minimum description length (MDL) principle
- Kernel PCA
- Edo Liberty
Intro. to Data Mining, Overview of Data Mining in Bioinformatics
- About Data Mining
- Data Mining Applications
- Data Mining In Depth: Description is Not Prediction
Datamining workflow, Data Preprocessing and cleaning, Intro. to R and Rattle
- Datamining workflow and presprocessing
- Primary web page for R.
Intro. to Classification, Decision Trees
- Classification: Basic Concepts, Decision Trees and Model Evaluation
Multilinear and Logistic Regression, Support Vector Machines (SVM)
The following chapters are available online courtesy of the authors/publishers for your personal use in this course. You may print a personal copy but you are prohibited from redistribution.
- Chapter 6 of Jiawei Han and Micheline Kamber (2005). Data Mining Concepts and Techniques. Morgan Kaufmann. 2nd Ed.
- Chapter 5 of Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to Data Mining. Addison Wesley.
- rpart R package vignette
- An Introduction to Recursive Partitioning Using the RPART Routines. Terry M. Therneau and Elizabeth J. Atkinson.
- MIT Sloan Lecture on Logistic Regression
- Modern trends in data mining
Slides for courses based on reference text books
- Ian Witten and Eibe Frank