Source Of The Section

DM_Sec 3_code.pdf

1. Installing scikit-learn-extra

# Installing scikit-learn-extra, which contains the KMedoids class
!pip install scikit-learn-extra

2. K-Medoids Clustering with Manual Data

2.1 Preparing the Data

import numpy as np

# Create a list of points (X, Y) and convert it into a NumPy array
data = np.array([
    [0, 9],
    [1, 0],
    [4, 4],
    [3, 8],
    [9, 2],
    [5, 1],
    [8, 7],
    [5, 9],
    [6, 4],
    [9, 3]
])

# Display the data array
print("Data:\\n", data)
# Data:
#  [[0 9]
#  [1 0]
#  [4 4]
#  [3 8]
#  [9 2]
#  [5 1]
#  [8 7]
#  [5 9]
#  [6 4]
#  [9 3]]

2.2 Running K-Medoids

from sklearn_extra.cluster import KMedoids

# Choose the number of clusters
k = 2

# Fit KMedoids to the data
kmedoids = KMedoids(n_clusters=k, random_state=0).fit(data)

# Extract the cluster labels for each point
labels = kmedoids.labels_

# Print out the labels
print("Cluster Labels:", labels)
# Cluster Labels: [0 1 0 0 1 1 0 0 0 1]

# Print the points belonging to each cluster
for i in range(k):
    cluster_points = data[labels == i]
    print(f"Cluster {i}:\\n", cluster_points)
# Cluster 0:
#  [[0 9]
#  [4 4]
#  [3 8]
#  [8 7]
#  [5 9]
#  [6 4]]

# Cluster 1:
#  [[1 0]
#  [9 2]
#  [5 1]
#  [9 3]]

3. K-Medoids Clustering with a CSV File

3.1 Reading the CSV File

import pandas as pd

# Read data from a CSV file into a DataFrame
# Replace the path below with the actual path to your CSV file
df = pd.read_csv("K_medoids.csv")

# Display the DataFrame
print("DataFrame:\\n", df)

3.2 Checking for NaN Values

# Check how many NaN values exist in each column
nan_counts = df.isnull().sum()
print("NaN Counts:\\n", nan_counts)

3.3 Removing Missing Values

# Drop rows that contain any NaN values
df = df.dropna()

# Display the DataFrame after removing rows with NaN
print("DataFrame After Dropping NaN:\\n", df)

3.4 Removing Duplicate Rows

# Drop any duplicate rows
df = df.drop_duplicates()

# Display the DataFrame after removing duplicates
print("DataFrame After Dropping Duplicates:\\n", df)