1
Clustering Methods
Student’s name
Institution affiliation
Professor’s name
Course name
Date
Clustering Methods
Centroid-based Clustering: K-means
K-means clustering is an unsupervised learning algorithm that requires data to be
categorized in a pre-defined number of clusters. The algorithm starts at a random point with
some centroids, and then all data points are assigned to the closest centroid based on measures
such as Euclidean distance. In later iterations, the centroid is recalculated as the mean of the
points assigned to that cluster, and points are reassigned based on this new centroid. This
proceeding will continue till the centroids stop being unstable or the maximum number of
iterations is over.
Connectivity-based Clustering: Hierarchical Clustering
Hierarchical clustering can be constructed by the divisive method (dividing clusters from
the top) or by the agglomerative method (merging clusters from the bottom). In the
agglomerative perspective, all data points begin as individual clusters that are merged pairwise
(such as single-linkage, complete-linkage, or average-linkage) as one moves up the hierarchical
structure based on the linkage criterion.
Density-based Clustering: DBSCAN
DBSCAN is designed to locate "clusters" crowded near a low-density region. It starts
with an arbitrary point, and if this point has a certain number of neighborhoods within a
particular radius, it will be a cluster (Bhattacharjee & Mitra, 2021). The neighboring points
within the radius should be investigated and included in the cluster because the clustering
process will continue until each density-connected cluster is determined.
3
References
Bhattacharjee, P., & Mitra, P. (2021). A survey of density-based clustering algorithms. Frontiers of Computer Science, 15, 1-27. https://link.springer.com/article/10.1007/s11704-019-9059-3