Of course, in practice, this is not usually how text and images are encoded: these graph representations are redundant since all images and all text will have very regular structures. For instance, images have a banded structure in their adjacency matrix because all nodes (pixels) are connected in a grid. The adjacency matrix for text is just a diagonal line, because each word only connects to the prior word, and to the next one.
Graph-valued data in the wild
Molecules as graphs
Social networks as graphs
What types of problems have graph structured data?
前面提到,结点、边和全局信息可以由 Embedding 向量进行表示,但连通性的表示就没那么容易了。对于连通性的表示,最简单的方法是使用邻接矩阵,但当图的结点数很多时,邻接矩阵会变得非常稀疏,在空间上比较低效。此外,对于同一张图,有很多等价的邻接矩阵,我们无法保证神经网络在输入这些不同的等价邻接矩阵时还能输出相同的结果 (that is to say, they are not permutation invariant). 一种更加优雅和存储高效的表示方法是邻接列表 (adjacency lists),它将每条从结点
n
i
n_i
ni 指向结点
n
j
n_j
nj 的边都表示为一个二元组
(
i
,
j
)
(i,j)
(i,j) (permutation invariant)
上图中,结点、边和全局信息都只用了一个标量表示,但更实际的方法是用向量表示
Graph Neural Networks
A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances).
As is common with neural networks modules or layers, we can stack these GNN layers together.
GNN Predictions by Pooling Information
下面讲解 GNN 的最后一层如何得到预测值
先考虑一个最简单的情况。如果要对图的所有结点进行多分类,那么直接用一个 MLP 将最终的结点向量变换到指定维度然后接 Softmax 就行了 如果我们只有边的信息而没有结点信息,那么我们也就没有结点的 Embedding 向量。此时如果想要对结点进行预测,可以通过 pooling
ρ
E
n
→
V
n
\rho_{E_n\rightarrow V_n}
ρEn→Vn 来由边信息得到结点信息,也就是将结点邻接的边的向量和全局向量加起来当作结点向量 (如果结点向量和边向量维度不一样,那么可以在 pooling 之前先对边向量进行投影再相加,也可以 concat 之后再进行投影) 如果只有结点向量没有边向量,想要对边进行预测,那么分类部分的网络结构如下: 如果只有结点向量,想要对整个图进行预测,那么类似 Global Average Pooling,我们可以汇集所有结点信息进行预测:
An end-to-end prediction task with a GNN model
GCN: Passing messages between parts of the graph
下面将介绍 GNN 如何利用图的结构信息 (连通性)
在 GNN layer 中,我们也可以利用 pooling 来进行相邻结点或相邻边之间的信息传递 (Message passing)。以结点为例,GNN layer 的输出结点向量为输入结点向量及其相邻结点向量相加后经过 MLP 得到 (the simplest type of message-passing GNN layer)
图中的每个点都代表了使用某种超参的模型。
x
x
x 轴代表模型参数量,
y
y
y 轴代表 AUC
Embedding 维数:We can notice that models with higher dimensionality tend to have better mean and lower bound performance but the same trend is not found for the maximum. Some of the top-performing models can be found for smaller dimensions.
GNN 层数:The box plot shows a similar trend, while the mean performance tends to increase with the number of layers, the best performing models do not have three or four layers, but two. Furthermore, the lower bound for performance decreases with four layers. This effect has been observed before, GNN with a higher number of layers will broadcast information at a higher distance and can risk having their node representations ‘diluted’ from many successive iterations