This course requires basic knowledge of statistics and artificial intelligence.

Course Description

The course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course Outline

  • Overview of Data Mining
    • Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL
  • Data Preparation
    • Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques
  • Classification
    • Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression
  • Clustering
    • K-Means, Fuzzy c-Means, Self-Organizing Map
  • Model Evaluation
    • Confusion Matrix, Recall and Precision, ROC Curve
  • Patterns and Association Mining
    • A-Priori Algorithm
  • Text Mining

Reference Books

  1. Introduction to Data Mining by Tan, Steinbach and Kumar (2006)
  2. Data Mining Concepts and Techniques by Han and Kamber (2006)
  3. Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2005)
  4. R and Data Mining: Examples and Case Studies by Yanchang Zhao (2013)

Marks Distribution

  • Two Midterms - 35%
  • Final - 40%
  • Projects - 25%