Evaluating Clustering in Machine Learning | by David Farrugia | Jul, 2023


PYTHON | DATA | MACHINE LEARNING

A guide to why, how, and what

David Farrugia
Towards Data Science
Photo by Nareeta Martin on Unsplash

Clustering has always been one of those topics that garnered my attention. Especially when I was first getting into the whole sphere of machine learning, unsupervised clustering always carried an allure with it for me.

To put it simply, clustering is rather like the unsung knight in shining armour of machine learning. This form of unsupervised learning aims to bundle similar data points into groups.

Visualise yourself in a social gathering where everyone is a stranger.

How would you decipher the crowd?

Perhaps, by grouping individuals based on shared traits, such as those laughing at a joke, the football aficionados deep in conversation, or the group captivated by a literary discussion. That’s clustering in a nutshell!

You may wonder, “Why is it relevant?”.

Clustering boasts numerous applications.

  • Customer segmentation helping businesses categorise their customers according to buying patterns to tailor their marketing approaches.
  • Anomaly detectionidentify peculiar data points, like suspicious transactions in banking.
  • Optimised resource utilisation by configuring computing clusters.

However, there’s a caveat.

How do we make sure that our clustering effort is successful?

How can we efficiently evaluate a clustering solution?

This is where the requirement for robust evaluation methods emerges.

Without a robust evaluation technique, we could potentially end up with a model that appears promising on paper, but drastically underperforms in practical scenarios.

In this article, we’ll examine two renowned clustering evaluation methods: the Silhouette score and Density-Based Clustering Validation (DBCV). We’ll dive into their strengths, limitations, and ideal scenarios of use.



Source link

This post originally appeared on TechToday.