
Summary of the paper
Title 
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size 
Authors 
Hiroyuki Shinnou and Minoru Sasaki 
Abstract 
Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to reduce the similarity matrix size. First, using kmeans, we obtain a clustering result for the given data set. From each cluster, we pick up some data, which are near to the central of the cluster. We take these data as one data. We call this data set as “committee”. Data except for committees remain one data. For these data, we construct the similarity matrix. Definitely, the size of this similarity matrix is reduced so much that we can perform spectral clustering using the reduced similarity matrix. 
Language 
Languageindependent 
Topics 
Text mining, Document Classification, Text categorisation, Acquisition, Machine Learning 
Full paper 
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size 
Slides 
 
Bibtex 
@InProceedings{SHINNOU08.62,
author = {Hiroyuki Shinnou and Minoru Sasaki},
title = {Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {2830},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2951740840},
note = {http://www.lrecconf.org/proceedings/lrec2008/},
language = {english}
} 

