CS224W Machine Learning with Graphs
Lecture 4
Title: Community Structure in Networks
Date: 2020. 04. 03 (FRI) ~ 2020. 04. 06 (MON)
Materials: Slides YouTube
Networks and Communities
- Roles(RoIX) based on the feature, Communities(Fast Modularity) based on cluster
Flow of Information
- Questions
- How does information flow through the network?
- How do people find out about new job?
- Contacts were often acquaintances(less than friends)1 rather than close friends
-
Granovetter’s Answer
- Link perspectives: Structural perspective and Interpersonal perspective
- Structural role(Triadic Closure)
- Structure: Well-embedded edges are socially strong
- Information: Well-embedded heavily redundant in terms of2 information access
- Through the weak connection, the node can get newer information which can’t get the well-embedded parts
- So, in the job seeking problem, people contact to acquaintances rather than friends
- Strong structure(Socially Strong) means each neighbors are also neighbors of others
- Weight of edge means interpersonal strength
- Triadic Closure means High clustering coefficient
-
Edge overlap
- The ratio of the # of mutual friends and # of union friends
- Overlap = 0 means an edge is a local bridge
-
Edge Removal
- By Strength: The network structure are decomposed faster when remove low # of weighted edge than high
- By Overlap: The network structure are decomposed faster when remove low edge overlap than high
Network Communities
- Definition
- Sets of tightly connected nodes
- Sets of nodes with lots of internal connections and few external ones
-
Communities can exchange clusters, groups and modules
- Zachary’s Karate club network(Social Network Data Example)
- In the network, there are two different karate club
- Split could be explained by a minimum cut in the network
Modularity Q
- How well as network is partitioned into communities
- Q∝∑{s}∈S[( # edges within group s) - (expected # edges within group s)]
- To calculated expected # edges with group, need a null model
- Null Model(Configuration Model)
- The expected # of edges between nodes i and j
- ki⋅kj2m (ki,Kj means degrees i and j)
- Modularity of partitioning S of graph G
- Q(G,S)={1}2m∑{s}∈S∑{i}∈s∑{j}∈s(Aij−{kikj}2m)
- where Aij = 1 if i -> j 0 otherwise
- Modularity values take range [-1, 1]
- Q greater than 0.3 - 0.7 means significant community structure
- Negative Q means anti-communities
- Positive Q means that nodes in the same communities are connected stronger than the null model (More connection than the null model)
- If Q=0, this community doesn’t a difference with random connection. However, if Q>0, this community have more connections(edges) than the random model.
- We can identify communities by maximizing modularity
Louvain Algorithm
-
Using Greedy algorithm for community detection(Greedily maximizes modularity)
- Using to study large networks
- Fast, Rapid convergence, High modularity output
-
Steps
- Optimize modularity by local changes to node-communities memberships
- Changing community of each node, calculate modularity delta(ΔQ)
- Possible communities to change are only neighbor of node
- ΔQ=ΔQ(i→C)+ΔQ(D→i)
- Modularity delta means the aggregation of change of modularity when node i put into community C and change of modularity when node i pull from community D
- The nodes’ community determine the community with maximum modularity
- Repeat until no increase modularity
- Changing community of each node, calculate modularity delta(ΔQ)
- Aggregate nodes to super-nodes to build a new network
- Repeat these two steps until no increase of modularity
- Optimize modularity by local changes to node-communities memberships
Detecting Overlapping Communities: BigCLAM
-
Define generative model for graphs
- Community Affiliation Graph Model(AGM)
-
Discover communities
-
Given graph G, make the assumption that G was generated by AGM
-
Find the best AGM = discover communities
-
Community Affiliation Graph Model(AGM)
- V,C,M,{pc}
-
V: Nodes
-
C: Communities
-
M: Memberships, The link between nodes and communities
- Pc: The probability that Node v has Membership m to Community c
To detect communities with AGM
-
Given a Graph, find the model F
- Terms
- M: Affiliation graph
- C: Number of communities like k in k-means
- pc: Parameters
- Maximum likelihood estimation
- argFmaxP(G|F)
- P(G|F)=∏{(u,v)}∈GP(u,v)∏{(u,v)∉G}(1−P(u,v))
- Membership has strengths (FuA)
- How strong connect node to community
- FuA=0 means that node u is not a member of community A