Source Of The Section

Hierachical Clustering.pdf

Hierarchical Clustering with Python

In Python, the SciPy and Scikit-Learn libraries have defined functions for hierarchical clustering.


Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Setting a style for the seaborn plots
sns.set_style('darkgrid')

Defining the Sample Dataset

# Define a small sample dataset X1
X1 = np.array([
    [9, 3],
    [8, 2],
    [9, 1],
    [3, 7],
    [7, 2],
    [9, 7],
    [4, 8],
    [8, 3],
    [3, 1],
    [1, 4]
])

# Print the array to see its contents
X1

We'll now use this data set to perform hierarchical clustering in Python.

Scatter Plot of the Data

plt.figure(figsize=(6, 6))

# Plot the points in red
plt.scatter(X1[:,0], X1[:,1], c='r')

# Create numbered labels for each point
for i in range(X1.shape[0]):
    plt.text(X1[i,0] + 0.1, X1[i,1],
             '({0})'.format(i+1),
             xytext=(3, 3),
             textcoords='offset points')

plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.title("Scatter Plot of the data")

# Set the range of the x and y ticks
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))

plt.grid()
plt.show()

Hierarchical Clustering using SciPy

First import: dendrogram, linkage

from scipy.cluster.hierarchy import dendrogram, linkage