Source Of Summary:
Lec.2.D. M. FCDS spring 2024.pdf
Data Mining and Analytics – Lecture 2
1. Introduction to Cluster Analysis 🌟
What is Clustering?
- Definition:
Clustering is the process of grouping a set of data objects into multiple clusters such that objects within a cluster have high similarity (🤝) while being very dissimilar to objects in other clusters (🚫).
- How it Works:
Similarity (or dissimilarity) is measured based on the attribute values of the objects, often using distance measures (like Manhattan distance 📏).
2. What is a Cluster? 📦
- Definition: A cluster is a collection of data objects that are:
- Similar (or related) within the same group, and
- Dissimilar (or unrelated) to objects in other groups.
This clear separation helps in understanding hidden patterns in the data.
3. What is Cluster Analysis? 🔬
- Definition:
Cluster analysis (also known as clustering or data segmentation) is an exploratory data analysis method. It groups similar objects together, without making predictions, by uncovering the underlying structure in the data.
- Key Point:
It is a form of unsupervised learning because it does not use predefined class labels.
- Applications:
Used in marketing, economics, and many branches of science.
4. Unsupervised Learning vs. Classification 🤖 vs. 🎯
- Unsupervised Learning (Clustering):
- Learns the useful structure of data without any labeled classes, optimization criteria, or feedback signals.
- Classification (Supervised Learning):
- Involves predefined classes and assigns new objects to one of those classes.