node2vec: Scalable Feature Learning for Networks

Injung Hwang
속성 1
KDD'16 paper, Standford

Brief Introduction

For now, traditional & typical node embedding technique
Propose graph feature learning approaches
Inspired by paper "word2vec"
Adopting word embedding from "skip-diagram" idea to nodes of graph
Propose biased walking technique


Problems in Graph

Node classification (node-level task)
Node degree, centrality, clustering coefficient, graphlet
Link prediction (link-level task)
Distance, local neighborhood, global neighborhood
Graph classification (graph-level task)

Node Embedding in Graph

A mapping of nodes to a low-dimensional space of features
Maximizes the likelihood of preserving network neighborhoods of nodes
Traditionals → node embeddings → GNNs → Knowledge graphs → Deep Generative graph models

Conventional method (Linear algebraic approach)

PCA (Principal Component Analysis)

SVM (Support Vector Machine)

Shallow encoding

Classic search strategies

Problem of sampling neighborhoods of a source node
Learn structural equivalence
Obtains microscopic view of the neighborhood
The sampled neighborhoods tend to repeat (low variance)
Learn homophily (node closeness)
Reflect a macro-view of the neighborhood
Infer & characterize exact nature of node-to-node dependencies


A flexible neighborhood sampling strategy
Smoothly interpolate between BFS and DFS
Conditional likelihood with softmax
Objective function
Use negative sampling for approximation of ZuZ_u (Expensive)

Search Bias

Unlike Random walk (uniform probability)
Return parameter (p)
Controls likelihood of immediately revisiting a node in the walk
In-out parameter (q)
Controls the movement either "inward" and "outward"
if q > 1, biased inward
if q < 1, biased outward


"Les Misérables" network

77 nodes and 254 edges, and set d = 16
Top: p = 1, q = 0.5 → discover clusters/communities
Bottom: p = 1, q = 2 → Structural equivalence
Node2vec is not tied to a particular notion of equivalence

Multi-label classification for nodes and link prediction

Reference models
Spectral clustering: Matrix factorization approach (d eigenvectors)
DeepWalk: node2vec with uniform probability
LINE: learn with 2 phase: d/2 with BFS, d/2 with 2 hop distance
Use negative sampling rather than hierarchical softmax
Instead of normalizing with respect to all nodes, use k random "negative samples"
d = 128 (dimension)
r = 10, l = 80 (# of random walks, length)
k = 10 (# of negative samples)

Multi-label Classification

Macro F1 score
Added flexibility of node2vec outperforms the other benchmark

Parameter sensitivity & Perturbation analysis

While a low q encourages outward exploration, balanced by a low p
Which ensures that the walk does not go too far from the start node
Node2vec shows robustness to missing & noisy edges


Link Prediction

Remove 50% of edges randomly
None of feature learning algorithms have been previously used for link prediction
Evaluate against some popular heuristic scores
Facebook: user & friendship social network
PPI: proteins network
arXiv ASTRO-PH: paper authors and collaborators