# Local-copy-cs445

### From GersteinInfo

This page contains general information for the class:

*CPSC445/CPSC545/MBB334/MBB545/CBB545* Spring 2008

**Introduction to Data Mining**

## Contents |

## Course websites

- Course wiki
- Yale classes server
- Previous years' pages:

## Homework

- Homework 1
- assignment
- Due: Feb 7, 2008.
- submitted summaries

- Homework 2
- assignment
- Due: Feb 21, 2008

- Homework 3
- assignment
- hw3_data.csv
- hints
- Due: Mar 6, 2008

## Final Project

- CPSC445b/545b (2008) Term Projects
- description
- One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)
- Project reports due: April 28, 2008 (send to jiang.du@yale.edu)
- 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)
- presentation sechedule

## Slides

### Week 1

- Tu Jan 15, Martin Schultz (MS): Introduction to Data Mining

### Week 2

- Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning

- Thur Jan 24, MS: OLAP, Regression

### Week 3

- Tu Jan 29, MS: Multilinear Regression, cross validation

- Thur Jan 31, MS: discriminant analysis, perceptrons, SVM

### Week 4

- Tu Feb 5, Mark Gerstein (MG): Bayesian classification

### Week 5

- Tu Feb 12, MS: Logistic regression

- Thur Feb 14, MG: PCA

### Week 6

- Tu Feb 19, MS: k-nearest neighbors, neural networks

- Thur Feb 21, MS: Clustering

### Week 7

- Tu Feb 26, MS: Mining Time series

- Thur Feb 28, Michael Krauthammer (MK): Text Mining

### Week 8

- Tu Mar 4, Songhua Xu (SX): Web, image mining

- Thur Mar 6, MS: Association Analyses

- Thur Mar 6, JDU: Data Mining Packages in R: logistic regression & SVM

### Week 9, 10

- Spring Break

### Week 11

- Tu Mar 25: Max and Kjell: DM Lab (AKW 400)

- Thur Mar 27: MS

### Week 12

- Tu Apr 1: Student presentations

- Thur Apr 3: Max and Kjell: DM Lab

### Week 13

- Tu Apr 8: Max and Kjell: DM lab

- Thur Apr 10: student presentations

### Week 14

- Tu Apr 15: student presentations

- Thur Apr 17: student presentations

### Week 15

- Tu Apr 22: student presentations

- Thur Apr 24: student presentations

## Suggested Readings

### Week 1

- Super Crunchers
- Chapters 1-4, pp. 1-102.
- Keep your eyes open for potential term application oriented projects

- Weka Book (Witten and Frank)
- Chapter 1, pp. 1-40.

### Week 2

- Super Crunchers
- Chapters 5-6, pp. 103-155.

- Weka Book (Witten and Frank)
- Chapter 2, pp. 41-60.
- Chapter 4, Section 4.6, pp. 119-127.

### Week 3

- Super Crunchers
- Chapters 7-8, pp. 156-218.

- Weka Book (Witten and Frank)
- Chapter 4, Section 4.6, pp. 119-127.
- Chapter 6, Section 6.3, pp. 214-235.

### Week 4

- Weka Book (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.

### Week 5

- Weka Book (Witten and Frank)
- Logistic Regression: Chapter 4, Section 4.6, pg. 121-125.

- PCA, SVD, and LSI

### Week 6

- Weka Book (Witten and Frank)
- K-nearest neighbors (instance-based learning): Chapter 4, Section 4.7, pp. 128-136, Chapter 6, Section 6, pp. 235-243.
- Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.
- Clustering: Chapter 4, Section 4.8, pp. 136-139, Chapter 6, Section 6.6, pp. 254-271.

### Week 8

- Weka Book (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.

## Other Online Materials

- Introduction to Data Mining (by Tan et al.)

- Data Mining (by Graham Williams)
- Draft Book
*for use only in this course*

- Draft Book

- Intro to R and Data Mining (by Luis Torgo)

- R Documentation