Data Science Tools & Software - Assignment No. 3


Q1. (Dissimilarity & Linkage)

Given three objects A, B, and C with the following pairwise dissimilarities:

<aside> šŸ“

$$ P(X)=\begin{pmatrix}0&1&4\\1&0&2\\4&2&0\end{pmatrix} $$

</aside>

1.1. When using single‐linkage (nearest‐neighbor) clustering, which two objects merge first?

1.2. After the first merge in single‐linkage (from 1.1), what is the distance between the newly formed cluster and the remaining object?

1.3. Using complete‐linkage (farthest‐neighbor) on the same dissimilarity matrix, which two objects merge first?

1.4. After merging in complete‐linkage, what is the linkage distance between the new cluster and the remaining object?

1.5. Suppose we treat objects A and B as initial centroids for two clusters (C₁ and Cā‚‚) in a ā€œk‐meansā€ style assignment, and object C is left unassigned. If

$x_1 = [1,\;3], \quad x_2 = [2,\;5], \quad x_3 = [3,\;7],$

which cluster does xā‚ƒ join (hard assignment) if using Euclidean distance?

1.6. In that same scenario (1.5), if xā‚ƒ joins whichever of C₁ or Cā‚‚ is closest, what will be the coordinates of the new centroid of the cluster that absorbs xā‚ƒ?


Q2. (Clustering Cost, Linkage, DBSCAN)

Consider a 2‐D dataset and two cluster centers $m_1 = (1,1)$ and $m_2 = (3,3)$.

2.1. If you replace $m_1$ with the new center $(2,2)$, the cost change (sum of squared distances from points to the closest center) depends on the points’ locations. In general, shifting a center closer to the cluster points will:

2.2. Given the following 4 points in the plane:

P₁=(1,ā€Š4), Pā‚‚=(2,ā€Š0), Pā‚ƒ=(4,ā€Š1), Pā‚„=(0,ā€Š2)

When clustering these 4 points with single‐linkage, the first merge occurs between the two with the smallest pairwise distance. Which pair is it?

2.3. In a k-distance graph for $k=3$ (plotting each point’s distance to its 3rd-nearest neighbor, sorted), a sharp ā€œelbowā€ typically suggests a good choice of ε (Epsilon) for DBSCAN. If the k-distance plot jumps sharply around value 1.5, a suitable ε for DBSCAN would be around:

2.4. If you apply DBSCAN with $ε = √2$ and $\text{MinPts} = 2$ starting from point $(4,4)$, a point is a core point if it has at least MinPts within radius $ε$. Which type of point is $(4,4)$ if its only neighbor within distance $√2$ is $(3,3)$?

2.5. To produce the same two clusters from single‐linkage as from DBSCAN in 2.4, one must ā€œcutā€ the dendrogram at a height equal to:

Screenshot 2025-06-02 164733.png

2.6. If single‐, complete‐, and average‐linkage each produce two clusters, the pairwise distances between those two clusters (i.e., the distance between cluster centroids, or the minimum/maximum distance) will generally satisfy: