Local-copy-cs445

From GersteinInfo

Jump to: navigation, search

This page contains general information for the class:

CPSC445/CPSC545/MBB334/MBB545/CBB545 Spring 2008

Introduction to Data Mining

Contents

Course websites

Homework

Final Project

  • CPSC445b/545b (2008) Term Projects
    • description
    • One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)
    • Project reports due: April 28, 2008 (send to jiang.du@yale.edu)
    • 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)
    • presentation sechedule

Slides

Week 1

Week 2

  • Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning
  • Thur Jan 24, MS: OLAP, Regression

Week 3

  • Tu Jan 29, MS: Multilinear Regression, cross validation

Week 4

Week 5

  • Tu Feb 12, MS: Logistic regression

Week 6

  • Thur Feb 21, MS: Clustering

Week 7

  • Tu Feb 26, MS: Mining Time series

Week 8

  • Thur Mar 6, MS: Association Analyses
  • Thur Mar 6, JDU: Data Mining Packages in R: logistic regression & SVM

Week 9, 10

  • Spring Break

Week 11

  • Thur Mar 27: MS

Week 12

  • Tu Apr 1: Student presentations

Week 13

  • Tu Apr 8: Max and Kjell: DM lab

Week 14

  • Tu Apr 15: student presentations
  • Thur Apr 17: student presentations

Week 15

  • Tu Apr 22: student presentations
  • Thur Apr 24: student presentations

Suggested Readings

Week 1

  • Super Crunchers
    • Chapters 1-4, pp. 1-102.
    • Keep your eyes open for potential term application oriented projects
  • Weka Book (Witten and Frank)
    • Chapter 1, pp. 1-40.

Week 2

  • Weka Book (Witten and Frank)
    • Chapter 2, pp. 41-60.
    • Chapter 4, Section 4.6, pp. 119-127.

Week 3

  • Weka Book (Witten and Frank)
    • Chapter 4, Section 4.6, pp. 119-127.
    • Chapter 6, Section 6.3, pp. 214-235.

Week 4

  • Weka Book (Witten and Frank)
    • Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
    • Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.

Week 5

  • Weka Book (Witten and Frank)
    • Logistic Regression: Chapter 4, Section 4.6, pg. 121-125.
  • PCA, SVD, and LSI

Week 6

  • Weka Book (Witten and Frank)
    • K-nearest neighbors (instance-based learning): Chapter 4, Section 4.7, pp. 128-136, Chapter 6, Section 6, pp. 235-243.
    • Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.
    • Clustering: Chapter 4, Section 4.8, pp. 136-139, Chapter 6, Section 6.6, pp. 254-271.

Week 8

  • Weka Book (Witten and Frank)
    • Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
    • Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.

Other Online Materials

  • Data Mining (by Graham Williams)
Personal tools