<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://info.gersteinlab.org/index.php?action=history&amp;feed=atom&amp;title=Local-copy-cs445</id>
	<title>Local-copy-cs445 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://info.gersteinlab.org/index.php?action=history&amp;feed=atom&amp;title=Local-copy-cs445"/>
	<link rel="alternate" type="text/html" href="https://info.gersteinlab.org/index.php?title=Local-copy-cs445&amp;action=history"/>
	<updated>2026-05-14T06:30:23Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.6</generator>
	<entry>
		<id>https://info.gersteinlab.org/index.php?title=Local-copy-cs445&amp;diff=90&amp;oldid=prev</id>
		<title>Infoadmin: Created page with &#039;This page contains general information for the class:  &#039;&#039;CPSC445/CPSC545/MBB334/MBB545/CBB545&#039;&#039; Spring 2008  &#039;&#039;&#039;Introduction to Data Mining&#039;&#039;&#039;  == Course websites ==  * Course wi…&#039;</title>
		<link rel="alternate" type="text/html" href="https://info.gersteinlab.org/index.php?title=Local-copy-cs445&amp;diff=90&amp;oldid=prev"/>
		<updated>2010-06-10T13:34:34Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;#039;This page contains general information for the class:  &amp;#039;&amp;#039;CPSC445/CPSC545/MBB334/MBB545/CBB545&amp;#039;&amp;#039; Spring 2008  &amp;#039;&amp;#039;&amp;#039;Introduction to Data Mining&amp;#039;&amp;#039;&amp;#039;  == Course websites ==  * Course wi…&amp;#039;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;This page contains general information for the class:&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;CPSC445/CPSC545/MBB334/MBB545/CBB545&amp;#039;&amp;#039; Spring 2008&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Introduction to Data Mining&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
== Course websites ==&lt;br /&gt;
&lt;br /&gt;
* Course wiki&lt;br /&gt;
** http://lab.zoo.cs.yale.edu/cs445-wiki/&lt;br /&gt;
* Yale classes server&lt;br /&gt;
** http://classesv2.yale.edu/&lt;br /&gt;
* Previous years&amp;#039; pages:&lt;br /&gt;
** Spring 2007&lt;br /&gt;
*** http://www.gersteinlab.org/courses/545/07-spr/&lt;br /&gt;
*** http://wiki.gersteinlab.org/pubinfo/index.php/Cs545-07&lt;br /&gt;
&lt;br /&gt;
== Homework ==&lt;br /&gt;
&lt;br /&gt;
* Homework 1&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw1_2008-2.doc assignment]&lt;br /&gt;
** Due: Feb 7, 2008.&lt;br /&gt;
** [[hw1 summaries| submitted summaries]]&lt;br /&gt;
&lt;br /&gt;
* Homework 2&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw2_2008-3.doc assignment]&lt;br /&gt;
** Due: Feb 21, 2008&lt;br /&gt;
&lt;br /&gt;
* Homework 3&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw3_2008-2.doc assignment]&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/hw3_data.csv hw3_data.csv]&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/hw3_hints.doc hints]&lt;br /&gt;
** Due: Mar 6, 2008&lt;br /&gt;
&lt;br /&gt;
== Final Project ==&lt;br /&gt;
&lt;br /&gt;
* CPSC445b/545b (2008) Term Projects&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_term_projects_2008-v2.doc description]&lt;br /&gt;
** One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)&lt;br /&gt;
** Project reports due: April 28, 2008 (send to jiang.du@yale.edu)&lt;br /&gt;
** 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)&lt;br /&gt;
** [http://spreadsheets.google.com/pub?key=pXgR9Xs-YQoHGGHqva0_5Lw presentation sechedule]&lt;br /&gt;
&lt;br /&gt;
== Slides ==&lt;br /&gt;
&lt;br /&gt;
=== Week 1 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Jan 15, [http://www.cs.yale.edu/people/schultz.html Martin Schultz] (MS): Introduction to Data Mining&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/kumar_1.ppt slides]&lt;br /&gt;
&lt;br /&gt;
* Thur Jan 17, [http://homes.gersteinlab.org/people/jiangdu/ Jiang Du] (JD): Introduction to R&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/data_mining-08spring-Intro2R.ppt slides]&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/misc/wine.data wine.data] [http://zoo.cs.yale.edu/classes/cs445/misc/wine.r wine.r]&lt;br /&gt;
&lt;br /&gt;
=== Week 2 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning&lt;br /&gt;
&lt;br /&gt;
* Thur Jan 24, MS: OLAP, Regression&lt;br /&gt;
&lt;br /&gt;
=== Week 3 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Jan 29, MS: Multilinear Regression, cross validation&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/DM_multiple_regression.ppt slides]&lt;br /&gt;
&lt;br /&gt;
* Thur Jan 31, MS: discriminant analysis, perceptrons, SVM&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/DM_SVM.ppt slides: SVM]&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/DM_SVM-law.ppt additional slides: SVM]&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/DM_perceptron.ppt slides: perceptron models]&lt;br /&gt;
&lt;br /&gt;
=== Week 4 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Feb 5, [http://bioinfo.mbb.yale.edu/about Mark Gerstein] (MG): Bayesian classification&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo3-bayes1.ppt slides: Predicting Networks through Bayesian Integration #1 - Theory]&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo4-bayes2.ppt slides: Predicting Networks through Bayesian Integration #2 - Application]&lt;br /&gt;
&lt;br /&gt;
* Thur Feb 7, JD: Decision trees&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt slides]&lt;br /&gt;
** [http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm example]&lt;br /&gt;
&lt;br /&gt;
=== Week 5 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Feb 12, MS: Logistic regression&lt;br /&gt;
&lt;br /&gt;
* Thur Feb 14, MG: PCA&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo5-svd1.ppt slides: Theory]&lt;br /&gt;
** [http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo6-svd2.ppt slides: Application]&lt;br /&gt;
&lt;br /&gt;
=== Week 6 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Feb 19, MS: k-nearest neighbors, neural networks&lt;br /&gt;
** [http://www.gapminder.org/video/talks/ted-2007---the-seemingly-impossible-is-possible.html link: Hans Rosling&amp;#039;s presentation]&lt;br /&gt;
&lt;br /&gt;
* Thur Feb 21, MS: Clustering&lt;br /&gt;
&lt;br /&gt;
=== Week 7 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Feb 26, MS: Mining Time series&lt;br /&gt;
&lt;br /&gt;
* Thur Feb 28, [http://www.yalepath.org/faculty.lasso?id=KrauthammerM Michael Krauthammer] (MK): Text Mining&lt;br /&gt;
&lt;br /&gt;
=== Week 8 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Mar 4, [http://www.cs.hku.hk/~songhua/ Songhua Xu] (SX): Web, image mining&lt;br /&gt;
&lt;br /&gt;
* Thur Mar 6, MS: Association Analyses&lt;br /&gt;
&lt;br /&gt;
* Thur Mar 6, JDU: Data Mining Packages in R: logistic regression &amp;amp; SVM&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/r_pkgs-jdu.ppt slides]&lt;br /&gt;
&lt;br /&gt;
=== Week 9, 10 ===&lt;br /&gt;
&lt;br /&gt;
* Spring Break&lt;br /&gt;
&lt;br /&gt;
=== Week 11 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Mar 25: Max and Kjell: DM Lab (AKW 400)&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/Pfizer_Yale_Version.ppt slides]&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/Yale1.pdf Model Building: General Strategies, Data Pre-processing, and Partial Least Squares]&lt;br /&gt;
&lt;br /&gt;
* Thur Mar 27: MS&lt;br /&gt;
&lt;br /&gt;
=== Week 12 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Apr 1: Student presentations&lt;br /&gt;
&lt;br /&gt;
* Thur Apr 3: Max and Kjell: DM Lab&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/Yale2.pdf Model Building: Ensemble Methods]&lt;br /&gt;
&lt;br /&gt;
=== Week 13 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Apr 8: Max and Kjell: DM lab&lt;br /&gt;
&lt;br /&gt;
* Thur Apr 10: student presentations&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/slides/caret.pdf An Introduction to caret]&lt;br /&gt;
&lt;br /&gt;
=== Week 14 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Apr 15: student presentations&lt;br /&gt;
&lt;br /&gt;
* Thur Apr 17: student presentations&lt;br /&gt;
&lt;br /&gt;
=== Week 15 ===&lt;br /&gt;
&lt;br /&gt;
* Tu Apr 22: student presentations&lt;br /&gt;
&lt;br /&gt;
* Thur Apr 24: student presentations&lt;br /&gt;
&lt;br /&gt;
== Suggested Readings ==&lt;br /&gt;
&lt;br /&gt;
=== Week 1 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.randomhouse.com/bantamdell/supercrunchers/ Super Crunchers]&lt;br /&gt;
** Chapters 1-4, pp. 1-102.&lt;br /&gt;
** Keep your eyes open for potential term  application oriented projects&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Chapter 1, pp. 1-40.&lt;br /&gt;
&lt;br /&gt;
=== Week 2 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.randomhouse.com/bantamdell/supercrunchers/ Super Crunchers]&lt;br /&gt;
** Chapters 5-6, pp. 103-155.&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Chapter 2, pp. 41-60.&lt;br /&gt;
** Chapter 4, Section 4.6, pp. 119-127.&lt;br /&gt;
&lt;br /&gt;
=== Week 3 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.randomhouse.com/bantamdell/supercrunchers/ Super Crunchers]&lt;br /&gt;
** Chapters 7-8, pp. 156-218.&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Chapter 4, Section 4.6, pp. 119-127.&lt;br /&gt;
** Chapter 6, Section 6.3, pp. 214-235.&lt;br /&gt;
&lt;br /&gt;
=== Week 4 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Bayesian Methods: Chapter 4,  Section 4, pg. 141, Section 6, pp. 271-283.&lt;br /&gt;
** Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.&lt;br /&gt;
&lt;br /&gt;
=== Week 5 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Logistic Regression: Chapter 4,  Section 4.6, pg. 121-125.&lt;br /&gt;
* PCA, SVD, and LSI&lt;br /&gt;
** [http://www.cs.pitt.edu/~milos/courses/cs3750/Readings/Berry_etal-1999.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
=== Week 6 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** K-nearest neighbors (instance-based learning): Chapter 4,  Section 4.7, pp. 128-136,  Chapter 6, Section 6, pp. 235-243.&lt;br /&gt;
** Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.&lt;br /&gt;
** Clustering: Chapter 4, Section 4.8, pp.  136-139, Chapter 6, Section 6.6, pp. 254-271.&lt;br /&gt;
&lt;br /&gt;
=== Week 8 ===&lt;br /&gt;
&lt;br /&gt;
* [http://www.cs.waikato.ac.nz/~ml/weka/book.html Weka Book] (Witten and Frank)&lt;br /&gt;
** Bayesian Methods: Chapter 4,  Section 4, pg. 141, Section 6, pp. 271-283.&lt;br /&gt;
** Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.&lt;br /&gt;
&lt;br /&gt;
== Other Online Materials ==&lt;br /&gt;
&lt;br /&gt;
* Introduction to Data Mining (by Tan et al.)&lt;br /&gt;
** [http://www-users.cs.umn.edu/~kumar/dmbook/index.php Chapter 4, 6 and 8]&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/misc/chap5_other_classification.pdf Chapter 5]&lt;br /&gt;
&lt;br /&gt;
* Data Mining (by Graham Williams)&lt;br /&gt;
** [http://zoo.cs.yale.edu/classes/cs445/misc/mar13lae08.pdf Draft Book] &amp;#039;&amp;#039;for use only in this course&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
* Intro to R and Data Mining (by Luis Torgo)&lt;br /&gt;
** http://www.liaad.up.pt/~ltorgo/DataMiningWithR/&lt;br /&gt;
&lt;br /&gt;
* R Documentation&lt;br /&gt;
** http://www.cran.r-project.org/other-docs.html&lt;/div&gt;</summary>
		<author><name>Infoadmin</name></author>
	</entry>
</feed>