CS224W Machine Learning with Graphs
Lecture 3
Title: Motifs and Structural Roles in Networks
Date: 2020. 04. 01 (WED) ~ 04. 02 (THU)
Materials: Slides YouTube
Subnetworks
-
Decomposition of a big network
-
Isomorphic: the same graph with the same edges size and the same directions
-
Significant is a metric capable of classifying the subgraph
- under-representation(negative significant)
- over-representation(positive significant)
- Significant is different by networks (the same subgraph is positive in Language network and negative in Neurons network)
Subgraphs, Motifs, and Graphlets
Network Motifs
-
Significant recurring patterns of interconnections in the network
- Pattern: Small induced1 subgraph
- ‘Induced subgraph’ means that one or more subgraphs which have the same structure of the subgraph of interest(Motif) can find in the full graph
- Recurring: High frequency
- ‘Recurrence’ means how many find induced subgraphs(occurrence)
- Significant: More frequent than expected
- It’s based on z-score
- \[Z_i = (N^{real}_i-\bar{N}^{rand}_i) / std(N^{rand}_i)\]
- \(N^{real}_i\) is #(subgraphs of type \(i\)) in network \(G^{real}\)
- \(N^{rand}_i\) is #(subgraphs of type \(i\)) in randomized network \(G^{rand}\)
- Network significant profile(SP) is a vector of normalized Z-scores
- \[SP_i = Z_i /\sqrt{\sum_jZ^2_j}\]
- Usually, \(G^rand\) generated 10,000~100,000 to estimated \(\bar{N}^{rand}_i\) and \(std(N^{rand}_i)\). It depends on the size of the real graph
- Pattern: Small induced1 subgraph
-
understanding how networks work and prediction operation and reaction of the network in a given situation
- Feed-forward loop: Move toward A to B
- Parallel loop: Move toward A to B without skipped nodes
- Sing-input modules: Single input and multiple outputs
-
Generation random model
-
Configuration Model
- Null model(random model) which is compared with the real graph with the degree sequence of the real graph
- Using spokes
- Why connect directly –> it’s not random!
-
Switching Model
- Repeat exchange endpoints from the given Graph \(G\) again and again (randomize!)
Graphlets: Node feature vectors
- Graphlets: possible subgraphs which are made by n-nodes
- use graphlets to obtain node-level subgraph metric
- Graphlet degree vector(GDV): # of non-isomorphic position in graphlet
- non-isomorphic position means that unique nodes in the graphlet(node touches)
- In the input graph, GDV means a vector is constructed by the number of graphlets that a node touches at a particular orbit
- Considering graphlets on 2 to 5 nodes(73 coordinates)
- It means that graphs usually make within 4 hops
Finding Motifs and Graphlets
- Finding size-k motifs/graphlets is a hard computational problem(NP-complete)
- There are two big two challenges
- Enumerating(To obtain all size-k connected subgraphs)
- Counting(To obtain occurrences of each subgraph type)
- Computation time grows exponentially as the size of the motif/graphlet increase
-
-
Algorithms
- Exact subgraph enumeration (ESU) [Wernicke 2006]
- Kavosh[Kashaniet al. 2009]
- Subgraph sampling [Kashtanet al. 2004]
Exact subgraph enumeration (ESU)
- Terms
- \(V_{subgraph}\): currently constructed subgraph(motif)
- \(V_{extension}\): set of candidate nodes to extend the motif
-
Enumerating(ESU-Tree)
-
ESU-Tree: Implementation as a recursive function
- 1st Step: Add Nodes(\(v\)) in \(V_{subgraph}\) and add nodes(\(w\)) which touched from \(v\) in \(V_{extension}\)
- 2nd Step: Add a pair of \(V_{subgraph}\) & \(V_{extension}\)(new \(v\)) in \(V_{subgraph}\) and add nodes which touched from \(v\) in \(V_{extension}\) (Only add \(w\) which have greater node id the node id of \(v\))
- Try again 2nd step until the depth attain the \(k\)
-
-
Counting (McKay’s nauty algorithm [McKay 1981])
- Determine which subgraphs in ESU-Tree leaves are topologically equivalent(isomorphic)
- Graph Isomorphism: The condition that given two graphs are the same structure
Structural Roles in Networks
- Meaning of Roles
- The function of nodes in a network(e.g., species in ecosystems, individuals in companies)
- A collection of nodes which have a similar position in a network(do not need to connect)
- Communities/Groups: A collection of nodes which are well-connected to each other
- How to find roles
- Structural equivalence(same relationships to all other nodes) [Lorrain & White 1971]
- In the adjacency matrix, nodes have the same role have the same vector
Discovering Structural Roles in Networks
- RolX: Automatic discovery of nodes’ structural roles in networks [Henderson, et al. 2011b]
- Unsupervised learning approach
- No prior knowledge required(automated and behavioral)
-
Recursive Feature Extraction
-
The matrix of nodes and columns(features)
-
Ego means center node and egonet means the network of nodes are connected ego directly
- Features that describe the node and its neighbors
- Characterize the structure of egonet
-
Local features: All measure of the node degree (in-out degree, total degree, weighted feature versions)
-
Egonet features: # of within-egonet edges, edges entering/leaving egonet
-
Additional feature: aggregate functions(mean, sum)
- The number of possible recursive features grows exponentially with each recursive iteration
- Reduce the number of features using a pruning technique
-
-
Role Extraction
- Cluster nodes based on extracted features using non-negative matrix factorization
-
유발하다, 유도하다 ↩