Lecture: Cluster Analysis
Požadavky na absolvování
Cluster Analysis
- common name for a whole collection of computational statistical procedures
- aim: to decompose the data into several homogeneous groups – clusters.
- the objects inside a cluster are as similar as possible;
- the objects from different clusters should resemble as little as possible
Definition
Let \(\mathbf X = \{\mathbf x_1, \mathbf x_2, \dotsc, \mathbf x_n \}\) be a set of objects, and some coefficient \(D\) of dissimilarity between objects. The cluster is a subset \(C \subseteq \mathbf X\) of objects such that \[\max D(\mathbf x_i, \mathbf x_j) < D(\mathbf x_k, \mathbf x_l)\] for each \(x_i, x_j, x_l \in C\) and each \(x_k \not\in C\).
- not constructive:
- describes the property which the cluster has to satisfy
- but does not explain how the cluster should be constructed.
- many clustering methods (see below)