CS224W Machine Learning with Graphs
Lecture 5
Title: Spectral Clustering
Date: 2020. 04. 07 (TUE) ~ 2020. 04. 08 (WED)
Materials: Slides YouTube
Additional Materials: ratsgo’s blog
Spectral Clustering
- Using eigenvectors, eigenvalue, and Linear algebra,
- Algorithms
-
Pre-processing
- Build Laplacian matrix \(L\)
-
Decomposition(Compute eigenvalues and eigenvectors)
- Find eigenvalues \(\lambda\) and eigenvectors \(x\) of the matrix \(L\)
- Find the second smallest eigenvalue \(\lambda_2\) and corresponding eigenvector \(x_2\)
-
Grouping
- Sort \(x_2\) and identify clusters by splitting point
- Split at 0 or median value
-
Graph Partitioning
- Using Network modularity, good partitioning means maximizing # of within-group connections and minimizing # of between-group connections
Graph Cuts
- Set of edges with one endpoint in each group
- \[cut(A, B) = \sum_{\{i\}\in{A}, {j}\in{B}}w_{ij}\]
- Minimum-cut (\(arg min_{A,B}cut(A,B)\))
- However, this criterion doesn’t mean optimal cut (Degenerate1 case)
- Because of that doesn’t consider internal cluster connectivity (only consider external cluster connectivity)
- Conductance (Smaller is better)
- \[\varnothing(A,B) = {\{cut(A,B)\}\over{min(vol(A),vol(B))}}\]
- where, \(vol(A)\) means total weighted degree of the nodes in A \(vol(A) = \sum_{\{i\}\in{A}}k_i\)
- It’s more balanced partitions.
- However, its computing is NP-hard
Spectral Graph Partitioning
- \(A\)(Adjacency matrix of undirected \(G\))
- \(x\) is vector of dimensionality(\(\mathfrak{R}^n\)): vector of nodes(?)
- \(y_i\) is a sum of labels \(x_j\) of neighbors of \(i\)()\(y = A \cdot x\))
- \({A}\cdot{x} = {\lambda}\cdot{x}\) where \(x\) is eigenvector and \(\lambda\) is eigenvalue
- If there is a \(d\)-regular graph(it means every node connect by same degree), its eigenvector is {1,1,…,1,1} and eigenvalue is \(d\)
- We can split graph when \(x_n\) and \(x_{n-1}\) are different In the \(d\)-regular graph, \(x_n\) and \(x_{n-1}\) are always same. However, if these two values are different, it means that there are weaker connection and it is the split point
- So, \(x_{n-1}\) splits the nodes into two group, when \(x_{n-1}[i] > 0\) or \(x_{n-1}[i] < 0\) (positive and negative means left side group or right side group)
Matrix Representations
- Adjacency matrix (\(A\))
- Degree matrix (\(D\))
- Laplacian matrix (\(L = D - A\))
- If \(x = (1,...,1)\) then \({L}\cdot{x} = 0\) and so \(\lambda = \lambda_1 = 0\)
\(\lambda_2\) Optimization
- \(lambda_2\) means the second smallest eigenvalue (The smallest eigenvalue is always zero)
- \[\lambda_2 = min{\{ \sum_{\{(i,j)\in{E}\}}(x_i - x_j)^2 \}\over{ \sum_i x_i^2 }}\]
- \(\sum_{\{(i,j)\}\in{E}}(x_i-x_j)^2\): We can find split point where each value in sigma is not zero
- \(\sum_i x_I^2\): This is just 1
- So, minimum \(\lambda_2\) means the optimization of graph partitioning
Rayleigh Theorem
- Find Optimal Cut [Fiedler’73]
- \[{y_i} = \begin{cases} +1 & if\ {i} \in {A} \\ -1 & if\ {i} \in {B}\end{cases}\]
- Minimize \(\sum_iy_i\), it’s the optimal cut in partition (A, B)
k-Way Spectral Clustering
-
Recursive bi-partitioning
-
Cluster multiple eigenvectors
-
Cluster multiple eigenvectors
- Advantages
- Approximates the optimal cut
- Emphasizes cohesive clusters
- Well-separated space
- Eigengap
- The difference between two consecutive2 eigenvalues
- \[\Delta_k = \left| \lambda_k - \lambda_{k-1} \right|\]
- \(k\) is the value which maximizes eigengap
- Advantages
Motif-Based Spectral Clustering
-
Find specific motif in the graph and cluster
- Different motifs reveal different modular structures
-
Motifs cut and conductance
- Motif conductance(\(\varnothing(S) = {\{\#(motifs cut)\} \over {vol_m(S)}}\))
- # of motif cut means how many motif cut \(S\)
- \(vol_M(S)\) means how many end points in the subgraph \(S\)
- \(vol_m(S)\) include the end points of cutting motif
- Finding set \(S\) with the minimal motif conductance is NP-hard
-
Motif Spectral Clustering
-
Create a new weighted graph \(W^{(M)}\) and Apply spectral clustering on \(W^{(M)}\)
-
Algorithm
-
Pre-processing
- Create weighted graph \(W^{(M)}\)
- Remove irrelevant edges
- Add weights how many times participates in the motif \(M\)
- Create weighted graph \(W^{(M)}\)
-
Decomposition
-
Grouping(Sweep Procedure)
- The select to best cut using motif conductance
-
-
Try to multiple motifs and select one of them
- Best motif decrease conductance quickly and less global minimum conductance
-