


lyngby_km_main - Main function for K-means clustering
function [Y, c, Info] = lyngby_km_main(X, ...
'PropertyName', 'PropertyValue')
Input: X Datamatrix, size: examples x variables. ie. if
voxels are to be clustered the datamatrix
should be (voxels x time).
Property: Type [ {median} | mean ]
Standardization [ {None} | Std | Range ] Determines
the individual standardization
(normalization) of the variables
(the columns in the
datamatrix. 'Std' will standardize
with the standard deviation, 'Range'
with the difference max-min
Clusters [ {10} | Integer ] Number of clusters.
Init [ {ReverseLog} | Linear |
UpperLinear | Random ] Initial
cluster centers determination. The
variables are sorted according to
max of xcorr or std of variables and
the initial centers are chosen from
this list.
DecayRate Convergence control parameter, {0} <
DecayRate <= 1. Determines how the
clustering center converge.
Iterations [ {20} | Integer ] Number of
iterations.
Variable [ {time} | xcorr ] Clustering with
Cross correlation or time
Paradigm Paradigm, the vector that is
used in the cross-correlation
with the datamatrix. This
variable needs to be defined if
the 'Variable' is 'xcorr'
Components [ {40} | integer ] Number of
cross-correlation components in the
analysis. Not used if 'Variabel' is
set to 'time'. Will max be set to
the number of columns in X
PositionWeight Smoothing of the clustering. Weight
for the proximity part of the error
function.
Output: Y Cluster center matrix, size: Scans x 'Clusters'
c Assignment vector for the all voxels
Info Shows the convergence of the Y's (array of
lenght 'Iterations').
lyngby_km_main performs K-means or K-median clustering. The
number of clusters is specified with 'Clusters'. If 'Variable'
is 'time' then the datamatrix X will be used (directly) as the
input for the clustering algorithm. If 'Variable' is 'xcorr'
then the cross-correlation between the datamatrix and the
'Paradigm' will be used as input.
The individual variables (columns) in the datamatrix can be
scaled according to 'Standardization': With 'Std' the columns
are scaled to have equal standard deviation; with 'Range' the
difference between minimum and maximum in each column is used
to scale. Standardization should be used when the variables are
measured with different units or the interesting features
important for the discrimination lies in the variables with
low magnitude. When the centers are found they are scaled back
to the original space.
'Init' determines how the cluster centers are initialized. For
all types of 'init' K specific objects (eg, voxels) are
selected (K corresponding to the number of clusters): For
'random' the initial cluster centers are initialized by
randomly picking K objects. For the other initialization
methods the selection is deterministic from sorted
objects. The sorting is either based on the standard deviation
of the original data or the maximum of the cross-correlation
function between the data and the paradigm. 'Linear' will
select with linear space though the sorted list of objects,
while 'reverselog' will select logarithmic through the list with
the most cluster centers picked from the objects with the
largest standard deviation or cross-correlation. 'UpperLinear'
will select from the top of the list.
Example:
% K-means clustering of Fisher's iris data
load iris.txt
[Y,c,Info] = lyngby_km_main(iris,'type','mean','clusters',3);
figure,plot(Info),title('Convergence'),xlabel('Iteration');
C=zeros(3,3);for n=1:150,C(ceil(n/50.1),c(n))=C(ceil(n/50.1),c(n))+1;end
disp('Confusion matrix'), disp(C)
See also LYNGBY, LYNGBY_KM_CENTERSIM, LYNGBY_KM_PLOT_DIST,
LYNGBY_IKM_MAIN, LYNGBY_UI_KM_INIT, LYNGBY_XCORR.
$Id: lyngby_km_main.m,v 1.28 2003/11/21 11:33:57 fnielsen Exp $