CAP5610 Machine Learning, Spring 2015

Machine Learning
CAP5610, Spring 2015

Time: Tu/Th 11:00am - 12:15pm
Place: ECS 145 (map)

Instructor: Ruogu Fang (rfang@cs.fiu.edu)
Office: ECS 333, (305)348-7982
Office Hours: Tue / Thu 2:00 pm - 3:30 pm

TA: Haodi Jiang (hjian005@fiu.edu)
Office: ECS 254
Office Hours: Mon / Tue 3:00 pm - 5:00 pm

Syllabus

Lectures

Assignments

Projects

Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to create spam filters, to analyze customer purchase data, to understand natural language, or to detect fraudulent credit card transactions.

Exam Time
Prelim 1: March 17, 2015
Prilim 2: April 16, 2015
Final Project Presentation: April 21 and 23, 2015
Final Project Writeup: April 28, 2015

Syllabus
This course will introduce the fundamental set of techniques and algorithms that constitute machine learning as of today, ranging from classification methods like decision trees, support vector machines and neural networks, over structured models like hidden Markov models, to clustering and matrix factorization methods for recommendation. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective, as well as the gaining fundamentals of applying machine learning techniques to real-world problems. In particular, the course will cover the following topics:

Concept Learning: Hypothesis space, version space
Instance-based Learning: K-Nearest Neighbors, collaborative filtering
Decision Trees: TDIDT, attribute selection, pruning and overfitting
ML Experimentation: Hypothesis tests, cross validation, resampling estimates
Linear Rules: Perceptron, logistic regression, linear regression, duality
Support Vector Machines: Optimal hyperplane, margin, kernels, stability
Generative Models: Naive Bayes, linear discriminant analysis
Hidden Markov Models: probabilistic model, estimation, Viterbi
Statistical Learning Theory: PAC learning, VC dimension, generalization error bounds
Clustering: HAC, k-means, mixture of Gaussians

Prerequisites
This course will be self-contained; students do not need to have machine learning background. This course will assume a reasonable knowledge of linear algebra as a prerequisite. The programming assignments will be in MATLAB/Python, so a familiarity with these languages is essential.
Please send me email or speak to me if you are unsure of whether you can take the course.

Lectures

01/13: Introduction
- What is learning?
- What is machine learning used for?
- Overview of course, course policies, and contact info.
01/15: Instance-Based Learning
- Definition of concept learning / binary classification, instance space, target function, training examples.
- Unweighted k-nearest neighbor (kNN) rule.
- Weighted kNN.
- Effect of selecting k.
- Supervised learning for binary classification, multi-class classification, regression, and stuctured output prediction.
- kNN for regression and collaborative filtering.
- Reading: Mitchell Ch. 1, 8.1 - 8.2. Amazon Recommendations
01/20: Decision-Tree Learning
- Hypothesis space, consistency, and version space
- List-then-eliminate algorithm
- Classifying with a decision tree
- Representational power of decision trees
- TDIDT decision tree learning algorithm
- Splitting criteria for TDIDT learning
- Reading: Mitchell 2.1-2.3, 2.5-2.5.2, 2.7, 3
01/22: Prediction and Overfitting
- Training error, test error, prediction error
- Independently identically distributed (iid) data
- Overfitting
- Occam's razor
- Reading: Mitchell 3.6-3.7
01/27: Model Selection and Assessment
- Model selection
  Controlling overfitting in decision trees
  Train, validation, test split
- Model Assessment
  What is the true error of a classification rule?
- Reading: Mitchell 5, Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
01/29: Linear Classifiers and Perceptrons
- Model Assessment
  Is rule h1 more accurate than h2?
  Is learning algorithm A1 more accurate than A2?
  k-fold Cross-Validation
- Linear classification rules
- Batch Perceptron learning algorithm
- Reading: Mitchell 4.4-4.4.2
02/03: Online Learning and Perceptron Mistake Bound
- Online learning model
- Online Perceptron learning algorithm
- Margin of linear classifier
- Perceptron mistake bound
- Reading: Mitchell 7.5, Cristianini/Shawe-Taylor 2-2.1.1
02/05: Support Vector Machines: Optimal Hyperplanes and Soft Margin
- Optimal hyperplanes and margins
- Hard-margin Support Vector Machine
- Soft-margin Support Vector Machine
- Primal optimization problems
- Readings: Schoelkopf/Smola 7.1-7.3, 7.5 (excluding crossed out sections)
02/10: Support Vector Machines: Duality and Leave-one-out
- Dual Perceptron
- Dual SVM optimization problem
- Connection between primal and dual
- Bounds on the leave-one-out error of an SVM
- Reading: Schoelkopf/Smola 7.3, 7.5, Cristianini/Shawe-Taylor 2-2.1.1
02/12: Support Vector Machines: Kernels
- Input space and feature space
- Kernels as a method for non-linear learning
- Kernels for learning with non-vectorial data
- Reading: Schoelkopf/Smola 7.4, 7.6, 7.8, Cristianini/Shawe-Taylor 3.1, 3.2, 3.3.2, 3.4.
02/19: Discriminative vs. Generative Learning
- Empirical Risk Minimization
- Regularized training of linear models
- Bayes decision rule and Bayes error rate
- Multivariate naive Bayes classifier
- Reading: Mitchell, Chapter 6.9 - 6.10, Duda, Hart & Stork, Pages 20-39
02/26: Generative Models for Classification
- Linear discriminant analysis
- Multinomial naive Bayes classifier
- Multi-class and multi-label classification
- Precision, recall, and F1 score
- Reading: Mitchell, Chapter 6.9 - 6.10; Duda, Hart & Stork, Pages 20-39
03/19: Modeling Sequence Data: Markov Models
- Less naive Bayes classifier
- Markov models
- Part-of-speech tagging
- Hidden Markov model (HMM)
- Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1).
- Leeds Online HMM Tutorial (except Forward and Forward/Backward Algorithm)
03/24: Modeling Sequence Data: HMMs and Viterbi
- Hidden Markov model (HMM)
- Estimation of HMM parameters
- Viterbi algorithm
- Experiments with POS tagging
03/26: Statistical Learning Theory: PAC Learning
- Psychic game
- Generalization error bound for finite H and zero error
- Sample complexity
- Probably Approximately Correct (PAC) learning
- Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)
03/31: Statistical Learning Theory: Error Bounds and VC-Dimension
- Generalization error bound for finite H and non-zero error
- Generalization error bound for infinite H and non-zero error
- Vapnik-Chervonenkis Dimension
- Theoretical characterization of overfitting
04/02: Clustering: Similarity-Based Clustering
- Unsupervised learning problems
- Hierarchical Agglomerative Clustering (HAC)
- Single-link, complete-link, group-average similarity
- Reading: Manning/Raghavan/Schuetze, Chapters 16 (not 16.3) and 17
04/07: Clustering: k-Means and Mixtures of Gaussians
- Flat clustering
- k-Means algorithms
- Mixture of gaussians model
- EM-algorithm for mixture of gaussians
04/14: Reinforcement Learning
- Passive learning
- Dynamic learning

Assignments and Exams
Homework assignments can be downloaded from Moodle. All assignments are due at the beginning of class on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 4 homeworks and a total of 8 late days, you final homework score will be 97 (which then accounts for 35% of your course grade). No assignment will be accepted after the solution was made public, which is typically 3-5 days after the time it was due. You can submit late assignments in class or following the policy written on the homework handout.

Graded homework assignments and prelims can be picked up at TA's place in ECS 254.

Regrade requests can be submitted within 7 days after the grades have been made available on Moodle. Regrade requests have to be submitted in writing and in hardcopy using this form (or similar). They can be submitted in class or to ECS 254.

Grading
This is a 3-credit course. Grades will be determined based on two written exams, a final project, homework assignments, and class participation.

40%: 2 Prelim Exams
30%: Final Project
20%: Homework (~4 assignments)
5%: Quizzes (in class)
5%: Class Participation

To eliminate outlier grades for homeworks and quizzes, the lowest grade is replaced by the second lowest grade when grades are cumulated at the end of the semester.

We always appreciate interesting homework solutions that go beyond the minimum. To reward homework solutions that are particularly nice, we will give you "Bonus Points". Bonus points are collected in a special category on Moodle. Bonus points are not real points and are not summed up for the final grade, but they can nudge somebody to a higher grade who is right on the boundary.

All assignment, exam, and final grades (including + and - of that grade) are roughly on the following scale: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60.

For the final project, we will use peer review similar to how scientific papers are reviewed for conferences and journals. This means that you will get to read and comment on other students work, which provides input for the TA's and the professor to determine the project grades. The quality of your reviewing also becomes a small component of your own project grade.

Students taking the class S/U do all work, except for the projects, and need to receive at least a grade equivalent to a D to pass the course.

Textbook
The main textbooks for the class are:

Tom Mitchell, "Machine Learning", McGraw Hill, 1997.

An additional textbook that can serve as an in-depth secondary reference on many topics in this class is:

Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via FIU Library)

The reading in the course packet are taken from the following books. In addition, these are some books for further reading beyond the scope of the course:

Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via FIU Library)
Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997.
Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
Joachims, "Learning to Classify Text using Support Vector Machines", Kluwer, 2002.
Leeds Tutorial on HMMs (online)
Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via FIU Library)
Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
Vapnik, "Statistical Learning Theory", Wiley, 1998.

Academic Integrity
This course follows the Florida International University Code of Academic Integrity. Each student in this course is expected to abide by the Florida International University Code of Academic Integrity. Any work submitted by a student in this course for academic credit must be the student's own work. Violations of the rules will not be tolerated.

The design of this course website and slides are modified from the excellent classe notes in other schools by Prof. Thorsten Joachims, Noah Snavely, and Aarti Singh. The instructor is thankful to researchers for providing high quality class materials online.