In Python, the SciPy and Scikit-Learn libraries have defined functions for hierarchical clustering.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Setting a style for the seaborn plots
sns.set_style('darkgrid')
# Define a small sample dataset X1
X1 = np.array([
[9, 3],
[8, 2],
[9, 1],
[3, 7],
[7, 2],
[9, 7],
[4, 8],
[8, 3],
[3, 1],
[1, 4]
])
# Print the array to see its contents
X1
We'll now use this data set to perform hierarchical clustering in Python.
plt.figure(figsize=(6, 6))
# Plot the points in red
plt.scatter(X1[:,0], X1[:,1], c='r')
# Create numbered labels for each point
for i in range(X1.shape[0]):
plt.text(X1[i,0] + 0.1, X1[i,1],
'({0})'.format(i+1),
xytext=(3, 3),
textcoords='offset points')
plt.xlabel('x coordinate')
plt.ylabel('y coordinate')
plt.title("Scatter Plot of the data")
# Set the range of the x and y ticks
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.grid()
plt.show()
linkage
function for hierarchical clustering.linkage
function has several methods available for calculating the distance between clusters: single, average, weighted, centroid, median, and ward.dendrogram
, linkage
from scipy.cluster.hierarchy import dendrogram, linkage