Source Of The Section

Hierarchical Clustering with Python

In Python, the SciPy and Scikit-Learn libraries have defined functions for hierarchical clustering.

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Setting a style for the seaborn plots
sns.set_style('darkgrid')

Defining the Sample Dataset

# Define a small sample dataset X1
X1 = np.array([
    [9, 3],
    [8, 2],
    [9, 1],
    [3, 7],
    [7, 2],
    [9, 7],
    [4, 8],
    [8, 3],
    [3, 1],
    [1, 4]
])

# Print the array to see its contents
X1

We'll now use this data set to perform hierarchical clustering in Python.

Scatter Plot of the Data

plt.figure(figsize=(6, 6))

# Plot the points in red
plt.scatter(X1[:,0], X1[:,1], c='r')

# Create numbered labels for each point
for i in range(X1.shape[0]):
    plt.text(X1[i,0] + 0.1, X1[i,1],
             '({0})'.format(i+1),
             xytext=(3, 3),
             textcoords='offset points')

plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.title("Scatter Plot of the data")

# Set the range of the x and y ticks
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))

plt.grid()
plt.show()

Hierarchical Clustering using SciPy

The SciPy library has the linkage function for hierarchical clustering.
The linkage function has several methods available for calculating the distance between clusters: single, average, weighted, centroid, median, and ward.
Each method has a different measure of how to compute the distance from the newly formed cluster to the remaining points.
For more details on the linkage function, see the docs.

First import: `dendrogram`, `linkage`

from scipy.cluster.hierarchy import dendrogram, linkage

Source Of The Section

Hierarchical Clustering with Python

Importing Libraries

Defining the Sample Dataset

Scatter Plot of the Data

Hierarchical Clustering using SciPy

First import: dendrogram, linkage

First import: `dendrogram`, `linkage`