Talk Title: Matrix-based Learning Algorithms for Data Mining and Bioinformatics

Speaker: Dr. Chris Ding from Lawrence Berkeley National Laboratory

Talk abstract:

Matrix-based data mining and statistical learning is going through a Renaissance period with many new developments. We describe several major advances in the area. We show that Principal Component Analysis (PCA) provides solutions to K-means clustering, thus connecting dimension reduction to clustering, two fundamental aspects of unsupervised learning.
We describe the state-of-art Laplacian matrix based spectral clustering and their effectiveness results from a self-aggregation property due to the nonlinear mapping. We describe their applications in bioinformatics and social sciences. These advances pave the way to establish a matrix factorization based learning framework, a new powerful direction in data mining. They benefit significantly from matrix knowledge accumulated over centuries and the successful developments of scientific
and engineering computing of the last 30 years. We also describe large scale data mining on distributed computers and over the Grid.

Short bio:

Chris Ding is a staff computer scientist at Lawrence Berkeley National Laboratory. He received a Ph.D. from Columbia University and did research at California Institute of Technology and Jet Propulsion Laboratory. His research focuses on bioinformatics and machine learning / data mining. He develops efficient graph algorithms using matrix computation.
More information about him can be found at http://crd.lbl.gov/~cding.