Processing math: 100%

[CS224W Lecture 4] Community Structure in Networks

CS224W Machine Learning with Graphs

Lecture 4

Title: Community Structure in Networks
Date: 2020. 04. 03 (FRI) ~ 2020. 04. 06 (MON)
Materials: Slides YouTube

Networks and Communities

  • Roles(RoIX) based on the feature, Communities(Fast Modularity) based on cluster

Flow of Information

  • Questions
    • How does information flow through the network?
    • How do people find out about new job?
      • Contacts were often acquaintances(less than friends)1 rather than close friends
  • Granovetter’s Answer

    • Link perspectives: Structural perspective and Interpersonal perspective
    • Structural role(Triadic Closure)
      • Structure: Well-embedded edges are socially strong
      • Information: Well-embedded heavily redundant in terms of2 information access
        • Through the weak connection, the node can get newer information which can’t get the well-embedded parts
        • So, in the job seeking problem, people contact to acquaintances rather than friends
      • Strong structure(Socially Strong) means each neighbors are also neighbors of others
        • Weight of edge means interpersonal strength
      • Triadic Closure means High clustering coefficient

    Granovetter Structure

  • Edge overlap

    • The ratio of the # of mutual friends and # of union friends
    • Overlap = 0 means an edge is a local bridge
  • Edge Removal

    • By Strength: The network structure are decomposed faster when remove low # of weighted edge than high
    • By Overlap: The network structure are decomposed faster when remove low edge overlap than high

Network

Network Communities

  • Definition
    • Sets of tightly connected nodes
    • Sets of nodes with lots of internal connections and few external ones
  • Communities can exchange clusters, groups and modules

  • Zachary’s Karate club network(Social Network Data Example)
    • In the network, there are two different karate club
    • Split could be explained by a minimum cut in the network

Modularity Q

  • How well as network is partitioned into communities
  • Q{s}S[( # edges within group s) - (expected # edges within group s)]
    • To calculated expected # edges with group, need a null model
    • Null Model(Configuration Model)
      • The expected # of edges between nodes i and j
      • kikj2m (ki,Kj means degrees i and j)
  • Modularity of partitioning S of graph G
    • Q(G,S)={1}2m{s}S{i}s{j}s(Aij{kikj}2m)
    • where Aij = 1 if i -> j 0 otherwise
  • Modularity values take range [-1, 1]
    • Q greater than 0.3 - 0.7 means significant community structure
    • Negative Q means anti-communities
      • Positive Q means that nodes in the same communities are connected stronger than the null model (More connection than the null model)
      • If Q=0, this community doesn’t a difference with random connection. However, if Q>0, this community have more connections(edges) than the random model.
  • We can identify communities by maximizing modularity

Modularity

Louvain Algorithm

  • Using Greedy algorithm for community detection(Greedily maximizes modularity)

    • Using to study large networks
    • Fast, Rapid convergence, High modularity output
  • Steps

    1. Optimize modularity by local changes to node-communities memberships
      • Changing community of each node, calculate modularity delta(ΔQ)
        • Possible communities to change are only neighbor of node
        • ΔQ=ΔQ(iC)+ΔQ(Di)
        • Modularity delta means the aggregation of change of modularity when node i put into community C and change of modularity when node i pull from community D
      • The nodes’ community determine the community with maximum modularity
      • Repeat until no increase modularity
    2. Aggregate nodes to super-nodes to build a new network
    3. Repeat these two steps until no increase of modularity

    Louvain Algorithm

Detecting Overlapping Communities: BigCLAM

  1. Define generative model for graphs

    • Community Affiliation Graph Model(AGM)
  2. Discover communities

    • Given graph G, make the assumption that G was generated by AGM

    • Find the best AGM = discover communities

Community Affiliation Graph Model(AGM)

  • V,C,M,{pc}
  • V: Nodes

  • C: Communities

  • M: Memberships, The link between nodes and communities

  • Pc: The probability that Node v has Membership m to Community c
To detect communities with AGM
  • Given a Graph, find the model F

  • Terms
    • M: Affiliation graph
    • C: Number of communities like k in k-means
    • pc: Parameters
  • Maximum likelihood estimation
    • argFmaxP(G|F)
    • P(G|F)={(u,v)}GP(u,v){(u,v)G}(1P(u,v))
  • Membership has strengths (FuA)
    • How strong connect node to community
    • FuA=0 means that node u is not a member of community A
  1. Acquaintance: 지인(친구보다는 덜 친분이 있는) 

  2. In terms of: ~면에서, ~관점에서