Principles of Data Mining
CAP5771, Spring 2017

Time: Tu/Th 7:55pm - 9:05pm
Place: PG6 112 (map)

Instructor: Ruogu Fang (
    Office: ECS 333, (305)348-7982
    Office Hours: Th 11:00 am - 12:00 pm
Teaching Assistant: Xiaolong Zhu
    Office: ECS 251
    TA Hours: Wed 3:00 pm - 5:00 pm

Data Mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It has gradually matured as a discipline merging ideas from statistics, machine learning, database and etc. This is an introductory graduate course for master and PhD computer science students on the topic of Data Mining. Topics include data mining applications, knowing your data, data exploration, and various data mining techniques (such as classification, association, clustering, anomaly detection) and advanced topics in data mining.

  • Data Mining Introduction
  • Data Mining Applications
  • Understanding Data
  • Data Exploration
  • Classification and Prediction
  • Ensemble Methods
  • Deep Learning
  • Association Analysis
  • Clustering
  • Advaced Topics in Data Mining

Course Schedule

Course Calendar

For lecture notes please access the Moodle System:

Policies on Assignments and Exams

All project deliverables and assignments should be submitted before midnight on the due date. The only excuse for missing an exam is verifiable cases of illness and emergencies and religious holidays. Please check the dates for exams and inform me at the earliest of any conflict due to the above-mentioned reasons.


The course assignments include projects and written homeworks. Projects will be designed to improve the critical analysis and problem-solving skills of students. Class attendance is mandatory. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.
  • Assignments: 30%
  • Exam: 30%
  • Final Project: 30%
  • Quizzes, Attendance & Surveys: 10%

Useful Links
Textbook & References

  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison Wesley, 2005.
  • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006, Second Edition.
  • Tom Mitchell. Machine Learning. McGraw Hill, 1997.
  • Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
  • Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann , 2003. Available on line at FIU Library .
Additional reading material from top conferences/journals and slides will be made available on Moodle for extended reading and learning.

  • COP 3530 Data Structures and Algorithms
  • STA 3033 Introduction to Probability & Statistics for CS and Engineering or equivalent
Academic Integrity

This course follows the Florida International University Code of Academic Integrity. Each student in this course is expected to abide by the Florida International University Code of Academic Integrity. Any work submitted by a student in this course for academic credit must be the student's own work. Violations of the rules will not be tolerated.

@2017 Ruogu Fang. All rights reserved. Last Updated: