CS224W Machine Learning with Graphs

Lecture 3

Title: Motifs and Structural Roles in Networks
Date: 2020. 04. 01 (WED) ~ 04. 02 (THU)
Materials: Slides YouTube

Subnetworks

Decomposition of a big network
Isomorphic: the same graph with the same edges size and the same directions
Significant is a metric capable of classifying the subgraph
- under-representation(negative significant)
- over-representation(positive significant)
- Significant is different by networks (the same subgraph is positive in Language network and negative in Neurons network)

Subgraphs, Motifs, and Graphlets

Network Motifs

Significant recurring patterns of interconnections in the network
- Pattern: Small induced¹ subgraph
  - ‘Induced subgraph’ means that one or more subgraphs which have the same structure of the subgraph of interest(Motif) can find in the full graph
- Recurring: High frequency
  - ‘Recurrence’ means how many find induced subgraphs(occurrence)
- Significant: More frequent than expected
  - It’s based on z-score
  - \[Z_i = (N^{real}_i-\bar{N}^{rand}_i) / std(N^{rand}_i)\]
  - \(N^{real}_i\) is #(subgraphs of type \(i\)) in network \(G^{real}\)
  - \(N^{rand}_i\) is #(subgraphs of type \(i\)) in randomized network \(G^{rand}\)
  - Network significant profile(SP) is a vector of normalized Z-scores
    - \[SP_i = Z_i /\sqrt{\sum_jZ^2_j}\]
    - Usually, \(G^rand\) generated 10,000~100,000 to estimated \(\bar{N}^{rand}_i\) and \(std(N^{rand}_i)\). It depends on the size of the real graph
understanding how networks work and prediction operation and reaction of the network in a given situation
- Feed-forward loop: Move toward A to B
- Parallel loop: Move toward A to B without skipped nodes
- Sing-input modules: Single input and multiple outputs
Generation random model
- Configuration Model
  - Null model(random model) which is compared with the real graph with the degree sequence of the real graph
  - Using spokes
  - Why connect directly –> it’s not random!
- Switching Model
  - Repeat exchange endpoints from the given Graph \(G\) again and again (randomize!)
Graphlets: Node feature vectors
- Graphlets: possible subgraphs which are made by n-nodes
  - use graphlets to obtain node-level subgraph metric
- Graphlet degree vector(GDV): # of non-isomorphic position in graphlet
  - non-isomorphic position means that unique nodes in the graphlet(node touches)
  - In the input graph, GDV means a vector is constructed by the number of graphlets that a node touches at a particular orbit
- Considering graphlets on 2 to 5 nodes(73 coordinates)
  - It means that graphs usually make within 4 hops
Finding Motifs and Graphlets
- Finding size-k motifs/graphlets is a hard computational problem(NP-complete)
- There are two big two challenges
  - Enumerating(To obtain all size-k connected subgraphs)
  - Counting(To obtain occurrences of each subgraph type)
  - Computation time grows exponentially as the size of the motif/graphlet increase
Algorithms
- Exact subgraph enumeration (ESU) [Wernicke 2006]
- Kavosh[Kashaniet al. 2009]
- Subgraph sampling [Kashtanet al. 2004]

Exact subgraph enumeration (ESU)

Terms
- \(V_{subgraph}\): currently constructed subgraph(motif)
- \(V_{extension}\): set of candidate nodes to extend the motif

Enumerating(ESU-Tree)
- ESU-Tree: Implementation as a recursive function
- 1st Step: Add Nodes(\(v\)) in \(V_{subgraph}\) and add nodes(\(w\)) which touched from \(v\) in \(V_{extension}\)
- 2nd Step: Add a pair of \(V_{subgraph}\) & \(V_{extension}\)(new \(v\)) in \(V_{subgraph}\) and add nodes which touched from \(v\) in \(V_{extension}\) (Only add \(w\) which have greater node id the node id of \(v\))
  - Try again 2nd step until the depth attain the \(k\)
Counting (McKay’s nauty algorithm [McKay 1981])
- Determine which subgraphs in ESU-Tree leaves are topologically equivalent(isomorphic)
- Graph Isomorphism: The condition that given two graphs are the same structure

Structural Roles in Networks

Meaning of Roles
- The function of nodes in a network(e.g., species in ecosystems, individuals in companies)
- A collection of nodes which have a similar position in a network(do not need to connect)
- Communities/Groups: A collection of nodes which are well-connected to each other
How to find roles
- Structural equivalence(same relationships to all other nodes) [Lorrain & White 1971]
- In the adjacency matrix, nodes have the same role have the same vector

Discovering Structural Roles in Networks

RolX: Automatic discovery of nodes’ structural roles in networks [Henderson, et al. 2011b]
- Unsupervised learning approach
- No prior knowledge required(automated and behavioral)

RoIX

Recursive Feature Extraction
- The matrix of nodes and columns(features)
- Ego means center node and egonet means the network of nodes are connected ego directly
  - Features that describe the node and its neighbors
  - Characterize the structure of egonet
  - Local features: All measure of the node degree (in-out degree, total degree, weighted feature versions)
  - Egonet features: # of within-egonet edges, edges entering/leaving egonet
- Additional feature: aggregate functions(mean, sum)
  - The number of possible recursive features grows exponentially with each recursive iteration
  - Reduce the number of features using a pruning technique
Role Extraction
- Cluster nodes based on extracted features using non-negative matrix factorization

유발하다, 유도하다 ↩

CS224W Graph Machine Learning

CS224W Machine Learning with Graphs

Lecture 3

Subnetworks

Subgraphs, Motifs, and Graphlets

Network Motifs

Graphlets: Node feature vectors

Finding Motifs and Graphlets

Exact subgraph enumeration (ESU)

Structural Roles in Networks

Discovering Structural Roles in Networks

Related Posts

[Deep Reinforcement Learning Nanodegree Chapter 4] Multi-Agent Reinforcement Learning 15 Dec 2020

[Deep Reinforcement Learning Nanodegree Chapter 3] Policy-Based Methods 10 Dec 2020

[Deep Reinforcement Learning Nanodegree Chapter 2] Value-Based Methods 27 Nov 2020