Source Of The Section
DM_Sec 3_code.pdf
1. Installing scikit-learn-extra
# Installing scikit-learn-extra, which contains the KMedoids class
!pip install scikit-learn-extra
2. K-Medoids Clustering with Manual Data
2.1 Preparing the Data
import numpy as np
# Create a list of points (X, Y) and convert it into a NumPy array
data = np.array([
[0, 9],
[1, 0],
[4, 4],
[3, 8],
[9, 2],
[5, 1],
[8, 7],
[5, 9],
[6, 4],
[9, 3]
])
# Display the data array
print("Data:\\n", data)
# Data:
# [[0 9]
# [1 0]
# [4 4]
# [3 8]
# [9 2]
# [5 1]
# [8 7]
# [5 9]
# [6 4]
# [9 3]]
2.2 Running K-Medoids
from sklearn_extra.cluster import KMedoids
# Choose the number of clusters
k = 2
# Fit KMedoids to the data
kmedoids = KMedoids(n_clusters=k, random_state=0).fit(data)
# Extract the cluster labels for each point
labels = kmedoids.labels_
# Print out the labels
print("Cluster Labels:", labels)
# Cluster Labels: [0 1 0 0 1 1 0 0 0 1]
# Print the points belonging to each cluster
for i in range(k):
cluster_points = data[labels == i]
print(f"Cluster {i}:\\n", cluster_points)
# Cluster 0:
# [[0 9]
# [4 4]
# [3 8]
# [8 7]
# [5 9]
# [6 4]]
# Cluster 1:
# [[1 0]
# [9 2]
# [5 1]
# [9 3]]
3. K-Medoids Clustering with a CSV File
3.1 Reading the CSV File
import pandas as pd
# Read data from a CSV file into a DataFrame
# Replace the path below with the actual path to your CSV file
df = pd.read_csv("K_medoids.csv")
# Display the DataFrame
print("DataFrame:\\n", df)
3.2 Checking for NaN Values
# Check how many NaN values exist in each column
nan_counts = df.isnull().sum()
print("NaN Counts:\\n", nan_counts)
3.3 Removing Missing Values
# Drop rows that contain any NaN values
df = df.dropna()
# Display the DataFrame after removing rows with NaN
print("DataFrame After Dropping NaN:\\n", df)
3.4 Removing Duplicate Rows
# Drop any duplicate rows
df = df.drop_duplicates()
# Display the DataFrame after removing duplicates
print("DataFrame After Dropping Duplicates:\\n", df)