Local-copy-cs445
From GersteinInfo
This page contains general information for the class:
CPSC445/CPSC545/MBB334/MBB545/CBB545 Spring 2008
Introduction to Data Mining
Contents |
Course websites
- Course wiki
- Yale classes server
- Previous years' pages:
Homework
- Homework 1
- assignment
- Due: Feb 7, 2008.
- submitted summaries
- Homework 2
- assignment
- Due: Feb 21, 2008
- Homework 3
- assignment
- hw3_data.csv
- hints
- Due: Mar 6, 2008
Final Project
- CPSC445b/545b (2008) Term Projects
- description
- One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)
- Project reports due: April 28, 2008 (send to jiang.du@yale.edu)
- 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)
- presentation sechedule
Slides
Week 1
- Tu Jan 15, Martin Schultz (MS): Introduction to Data Mining
Week 2
- Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning
- Thur Jan 24, MS: OLAP, Regression
Week 3
- Tu Jan 29, MS: Multilinear Regression, cross validation
- Thur Jan 31, MS: discriminant analysis, perceptrons, SVM
Week 4
- Tu Feb 5, Mark Gerstein (MG): Bayesian classification
Week 5
- Tu Feb 12, MS: Logistic regression
- Thur Feb 14, MG: PCA
Week 6
- Tu Feb 19, MS: k-nearest neighbors, neural networks
- Thur Feb 21, MS: Clustering
Week 7
- Tu Feb 26, MS: Mining Time series
- Thur Feb 28, Michael Krauthammer (MK): Text Mining
Week 8
- Tu Mar 4, Songhua Xu (SX): Web, image mining
- Thur Mar 6, MS: Association Analyses
- Thur Mar 6, JDU: Data Mining Packages in R: logistic regression & SVM
Week 9, 10
- Spring Break
Week 11
- Tu Mar 25: Max and Kjell: DM Lab (AKW 400)
- Thur Mar 27: MS
Week 12
- Tu Apr 1: Student presentations
- Thur Apr 3: Max and Kjell: DM Lab
Week 13
- Tu Apr 8: Max and Kjell: DM lab
- Thur Apr 10: student presentations
Week 14
- Tu Apr 15: student presentations
- Thur Apr 17: student presentations
Week 15
- Tu Apr 22: student presentations
- Thur Apr 24: student presentations
Suggested Readings
Week 1
- Super Crunchers
- Chapters 1-4, pp. 1-102.
- Keep your eyes open for potential term application oriented projects
- Weka Book (Witten and Frank)
- Chapter 1, pp. 1-40.
Week 2
- Super Crunchers
- Chapters 5-6, pp. 103-155.
- Weka Book (Witten and Frank)
- Chapter 2, pp. 41-60.
- Chapter 4, Section 4.6, pp. 119-127.
Week 3
- Super Crunchers
- Chapters 7-8, pp. 156-218.
- Weka Book (Witten and Frank)
- Chapter 4, Section 4.6, pp. 119-127.
- Chapter 6, Section 6.3, pp. 214-235.
Week 4
- Weka Book (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.
Week 5
- Weka Book (Witten and Frank)
- Logistic Regression: Chapter 4, Section 4.6, pg. 121-125.
- PCA, SVD, and LSI
Week 6
- Weka Book (Witten and Frank)
- K-nearest neighbors (instance-based learning): Chapter 4, Section 4.7, pp. 128-136, Chapter 6, Section 6, pp. 235-243.
- Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.
- Clustering: Chapter 4, Section 4.8, pp. 136-139, Chapter 6, Section 6.6, pp. 254-271.
Week 8
- Weka Book (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.
Other Online Materials
- Introduction to Data Mining (by Tan et al.)
- Data Mining (by Graham Williams)
- Draft Book for use only in this course
- Intro to R and Data Mining (by Luis Torgo)
- R Documentation