论文解读（GIN）《How Powerful are Graph Neural Networks》

论文解读（GIN）《How Powerful are Graph Neural Networks》
论文信息

论文标题：How Powerful are Graph Neural Networks
论文作者：Keyulu Xu, Weihua Hu, J. Leskovec, S. Jegelka
论文来源：2019, ICLR
论文地址：download
论文代码：download

1 Introduction

　　GNN 目前主流的做法是递归迭代聚合一阶邻域表征来更新节点表征，如 GCN 和 GraphSAGE，但这些方法大多是经验主义，缺乏理论去理解 GNN 到底做了什么，还有什么改进空间。

　　GNN 的变体均是遵循两个步骤：邻居聚合（neighborhood aggregation）和图池化（graph-level pooling）。　　

　　本文框架受 GNNs 和 WL 图同质测试的启发，若 GNNs 对不同构图能很好识别，则认为是具有较强的表征能力。

　　本文贡献：
- - 证明了GNN最多只和 Weisfeiler-Lehman (WL) test 一样有效，即 WL test 是GNN性能的上限；
  - 建立了邻域聚合（neighbor aggregation）和图读出函数（graph readout functions）的条件，在这些条件下，得到的 GNN 与 WL test 一样强大；
  - 提出图同构网络(Graph Isomorphism Network——GIN)，并证明了它的判别、表征能力等于 WL test 的能力；
2 Preliminaries

2.1 GNN steps

　　GNN 常见的两步走：1、聚合邻居信息；2、更新节点学习

　　GNN 的第 k 层表达式：

　　　　a(k)v= AGGREGATE (k)({h(k−1)u:u∈N(v)})

　　　　h(k)v=COMBINE(k)(h(k−1)v,a(k)v)

　　AGGREGATE 比较典型的例子是 GraphSAGE：

　　GraphSAGE 的 AGGREGATE 被定义为：

　　　　a(k)v=MAX({ReLU(W⋅h(k−1)u),∀u∈N(v)})

　　这里的代表的是 element-wise max-pooling 。

　　GraphSAGE 的 COMBINE 为：

　　　　

　　而在 GCN 中，AGGREGATE 和 COMBINE 集成为：

　　　　

　　对于节点分类任务，节点表示将作为预测的输入；对于图分类任务，READOUT 函数聚合了最后一次迭代输出的节点表示，并生成图表示 :

　　　　

　　其中：READOUT 函数是具有排列不变性的函数，如：summation。

2.2 Weisfeiler-Lehman test

　　图同构问题（ graph isomorphism problem）：询问这两个图在拓扑结构上是否相同。

　　WL test 为了辨别多标签图，具体步骤如下：[ 参考《Weisfeiler-Lehman(WL) 算法和WL Test》 ]
- - 迭代地聚合节点及其邻域的标签；
  - 将聚合后的标签散列为唯一的新标签。如果在某次迭代中，两个图之间的节点标签不同，则该算法判定两个图是非同构的；
　　基于 WL test 的多图相似性判别算法 WL subtree kernel 也被提出，图示如下：

　　　　

　　上述过程是将 2WL test 保存为树结构。

3 Theoretical framework：overview

　　Definition 1 (Multiset). A multiset is a generalized concept of a set that allows multiple instances for its elements. More formally, a multiset is a 2-tuple where is the underlying set of that is formed from its distinct elements, and gives the multiplicity of the elements.

　　假设节点及其邻居集合，假设节点的标签是 1 ，其对应的标签是 1、1、2、3、4，可以把邻居集合看成一个 Multiset 。

4 Building powerful graph neural networks

　　作者提出 Theorem 2：即为图同质测试。

　　Lemma 2. Let and be any two non-isomorphic graphs. If a graph neural network maps and to different embeddings, the Weisfeiler-Lehman graph isomorphism test also decides and are not isomorphic.

　　Lemma 2 证明：

　　假设：如果节点标签一致，那么节点表示也一致。

　　假设对节点做了次 WL test 标签聚合，其最终标签若相似，则节点表示也一致。那么如果在 GNN 中， hop 邻域一样，那么必然节点表示一样。WL test 过程和 GNN 聚合过程是一致的。

　　作者提出 Theorem 3：如果 GNN 中 Aggregate、Combine 和 Readout 函数是单射，GNN 可以和 WL test 一样强大。

　　Theorem 3. Let be a GNN . With a sufficient number of GNN layers, maps any graphs and that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to different embeddings if the following conditions hold:

　　 a) A aggregates and updates node features iteratively with

　　　　

　　 where the functions , which operates on multisets, and are injective(单射).

　　b) 's graph-level readout, which operates on the multiset of node features , is injective.

　　Theorem 3 证明和 Lemma2 证明思想类似，都是基于相同假设。

4.1 Graph isomorphism network(GIN)

　　为建模邻居聚合的单射多集函数。

　　下述 Lemma 5 阐述 sum aggregators 是单射的：

　　Lemma 5. Assume is countable. There exists a function so that is unique for each multiset of bounded size. Moreover, any multiset function can be decomposed as for some function .

　　Lemma5 证明：

　　出发点：考虑一个有个元素的 multiset ，对其进行任意划分，最多可以划分成个子集，所以很自然的可以使用个正整数对其打上唯一标记，因此证明可以是唯一的单射函数。

　　

　　Corollary 6. Assume is countable. There exists a function so that for infinitely many choices of , including all irrational numbers, is unique for each pair , where and is a multiset of bounded size. Moreover, any function over such pairs can be decomposed as for some function .

　　Corollary 6 证明：

　　

　　对于第一种情况利用 Lemma 5 解释，对于第二种情况利用无理数的性质。

　　Corollary 6 证明了是单射函数，同时本文也将和用 MLP 代替（由于 MLP 是万能近似函数，可以模拟单射性质），又根据单射的性质（若和皆为单射的，则亦为单射），得也是单射的，即：

　　　　

4.2 Graph-level readout of GIN

　　Readout 模块使用 concat+sum，对每次迭代得到的所有节点特征求和得到图的特征，然后拼接起来。

　　　　

　　即

　　　　

5 Less powerful but still interesting GNNs

　　本文研究不满足 Theorem 3 的 GraphSAGE 和 GCN，做了两个消融实验：
- - 1-layer perceptrons instead of MLPs .　　
  - mean or max-pooling instead of the sum.　　
5.1 1-layer perceotrons are not sufficient

　　许多GNN 任然采用 1 层的 perceptrons，对于某些 multiset 可能存在无法区别的问题。

　　Lemma 7. There exist finite multisets so that for any linear mapping , .

　　Lemma 7 证明：

　　

　　1 层的 perceptrons 表现得很像线性映射，因此 GNN 层退化为对邻域特征的简单求和。本文的证明建立在线性映射中缺乏偏差项这一事实之上。有了偏差项和足够大的输出维数，1 层的 perceptrons 可能能够区分不同的 multiset。

5.2 Structures that confuse mean and max-pooling

　　现在考虑将中的 sum 替换为 Mean-pooling 和 Max-pooling 将产生什么问题。

　　Mean-pooling 和 max-pooling aggregators 在某种程度上是一种好的 multiset functions [ 具有平移不变性 ]，但是他们不是单射的。

　　Figure 2 根据三个 aggregators 的表示能力进行了排序。

　　　　

　　三种不同的 Aggregate：
- - sum：学习全部的标签以及数量，可以学习精确的结构信息（不仅保存了分布信息，还保存了类别信息）；[ 蓝色：4个；红色：2 个 ]
  - mean：学习标签的比例（比如两个图标签比例相同，但是节点有倍数关系），偏向学习分布信息；[ 蓝色：的比例；红色：的比例 ]
  - max：学习最大标签，忽略多样，偏向学习有代表性的元素信息；[ 两类（类内相同），所以各一个 ]
　　Figure 3 说明了mean-pooling aggregators 和 max-pooling aggregators 无法区分的结构对。

　　　　

　　在 Figure 3a 中：Every node has the same feature and is the same across all nodes.
- - mean：左，右：，无法区分；
  - max：左 , 右无法区分；
  - sum：左 , 右 , 可以区分；
　　在 Figure 3b 中：Let denote node features transformed by .
- - mean: 左，右：，可以区分；
  - max : 左，右：，无法区分;
  - sum: 左 , 右 , 可以区分；
　　在 Figure 3c 中：
- - mean：左，右：，无法区分;
  - max：左，右：，无法区分；
  - sum：左，右：，可以区分；
5.3 Mean learns distrubutions

　　旨在说明等比例 multiset ，使用 mean 是无法区分的。

　　Corollary 8. Assume is countable. There exists a function so that for if and only if multisets and have the same distribution. That is, assuming and for some .

5.4 Max-pooling learns sets with distinct elements

　　Max-pooling 阐述的是，只要决定性元素（max value）一样，其他元素是否考虑无关紧要。显然这是不合理的。

　　Corollary 9. Assume is countable. Then there exists a function so that for if and only if and have the same underlying set.

6 Experiments

6.1 Training set performance of GINs

　　　　

　　在训练中，GIN 和WL test 一样，可以拟合所有数据集，这表说了 GIN 表达能力达到了上限

6.2 Generalization ability of GNNs

　　　　
- - GIN-0 比GIN-eps 泛化能力强：可能是因为更简单的缘故；
  - GIN 比 WL test 效果好：因为GIN进一步考虑了结构相似性，即WL test 最终是one-hot输出，而GIN是将WL test映射到低维的embedding；
  - max 在无节点特征的图（用度来表示特征）基本无效；
7 Conclusion

　　本文主要基于对 graph分类，证明了 sum 比 mean 、max 效果好，但是不能说明在node 分类上也是这样的效果，另外可能优先场景会更关注邻域特征分布，或者代表性，故需要都加入进来实验。

修改历史

2021-03-15 创建文章
2022-06-10 修订文章，大整理

论文解读目录
论文信息
1 Introduction
2 Preliminaries
    2.1 GNN steps
    2.2 Weisfeiler-Lehman test
3 Theoretical framework：overview
4 Building powerful graph neural networks
    4.1 Graph isomorphism network(GIN)
    4.2 Graph-level readout of GIN
5 Less powerful but still interesting GNNs
    5.1 1-layer perceotrons are not sufficient
    5.2 Structures that confuse mean and max-pooling
    5.3 Mean learns distrubutions
    5.4 Max-pooling learns sets with distinct elements
6 Experiments
    6.1 Training set performance of GINs
    6.2 Generalization ability of GNNs
7 Conclusion
修改历史
__EOF__
本文作者： Blair
本文链接： https://www.cnblogs.com/BlairGrowing/p/15961951.html
关于博主： 评论和私信会在第一时间回复。或者直接私信我。
版权声明： 本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！
声援博主： 如果您觉得文章对您有帮助，可以点击文章右下角【推荐】一下。

论文信息

1 Introduction

2 Preliminaries

2.1 GNN steps

2.2 Weisfeiler-Lehman test

3 Theoretical framework：overview

4 Building powerful graph neural networks

4.1 Graph isomorphism network(GIN)

4.2 Graph-level readout of GIN

5 Less powerful but still interesting GNNs

5.1 1-layer perceotrons are not sufficient

5.2 Structures that confuse mean and max-pooling

5.3 Mean learns distrubutions

5.4 Max-pooling learns sets with distinct elements

6 Experiments

6.1 Training set performance of GINs

6.2 Generalization ability of GNNs

7 Conclusion

修改历史