码农知识堂 - 1000bd
  •   Python
  •   PHP
  •   JS/TS
  •   JAVA
  •   C/C++
  •   C#
  •   GO
  •   Kotlin
  •   Swift
  • GNN 101


    GNN 101

    姚伟峰
    http://www.cnblogs.com/Matrix_Yao/

    • GNN 101
      • Why
        • Graph无处不在
        • Graph Intelligence helps
        • It’s the right time now!
      • What
        • 如何建模图
          • Different Types of Graph
        • 如何表示图
        • 如何计算图
        • 不同的图数据集规模
        • 不同的图任务
      • How
        • Workflow
        • 软件栈
        • SW Challenges
          • Graph Sampler
          • Scatter-Gather
          • More
      • Finishing words
      • References

    Why

    Graph无处不在

    Alt text

    Graph Intelligence helps

    Alt text

    It’s the right time now!

    Gartner预测,graph技术在数据和分析创新中的使用率从2021年的10%,到2025年会增长到80%。我们目前正在经历从early adoption到early mainstream的穿越大峡谷期间,既不太早也不太晚,时间刚刚好。

    Alt text

    What

    如何建模图

    A graph 𝒢 is an ordered pair 𝒢 = (𝑉, 𝐸) comprising:

    • 𝑉, a set of vertices (or nodes)

    • 𝐸⊆{(𝑥,𝑦)|𝑥,𝑦∈𝑉}, a set of edges (or links), which are pairs of nodes

    Example:
    Alt text

    Different Types of Graph

    • Are edges directed?
      Directed Graph vs. Undirected Graph

    • Are there multiple types of nodes or multiple types of edges?
      Homogeneous Graph vs Heterogeneous Graph

    如何表示图

    不同的表示方式会指向不同的计算模式。

    Alt text

    如何计算图

    如下图所示,图的计算步骤如下:

    • 遍历图中的所有结点,或者采样图中的一些结点。每次选择其中一个结点,称为目标结点(target node);

    • 一个𝐿-层的GNN至少需要聚合目标结点的L-跳领域的信息。因此,我们需要以围绕目标结点构造一个L-跳的ego network。图中是一个2-跳ego network的例子,其中绿色结点是第1跳,蓝色结点是第2跳;

    • 计算并更新ego-network里的每个结点的embedding。embeddings会使用到图的结构信息和结点与边的特征。

    Alt text

    那么,这些embedding是如何计算和更新的呢?主要是使用Message Passing的计算方法。Message Passing有一些计算范式如GAS(Gather-ApplyEdge-Scatter), SAGA(Scatter-ApplyEdge-Gather-ApplyVertex)等。我们这里介绍归纳得比较全面的SAGA计算范式。假设需要计算和更新下图中的:

    Alt text

    • Scatter
      Propagate message from source vertex to edge.

    Alt text

    • ApplyEdge
      Transform message along each edge.

    Alt text

    • Gather
      Gather transformed message to the destination vertex.

    Alt text

    • ApplyVertex
      Transform the gathered output to get updated vertex.

    Alt text

    公式如下:

    Alt text
    分析一下,会发现,SAGA模式中ApplyEdge和ApplyVertex是传统deep learning中的NN(Neural Network)操作,我们可以复用;而Scatter和Gather是GNN新引入的操作。即,Graph Computing = Graph Ops + NN Ops。

    Alt text

    不同的图数据集规模

    • One big graph
      可能高达数十亿的结点,数百亿的边。

    Alt text

    • Many small graphs

    Alt text

    不同的图任务

    • Node-level prediction
      预测图中结点的类别或性质

    Alt text

    • Edge-level prediction
      预测图中两个结点是否存在边,以及边的类别或性质

    Alt text

    • Graph-level prediction
      预测整图或子图的类别或性质

    Alt text

    How

    Workflow

    Alt text

    以fraud detection为例:

    • Tabformer数据集
      Alt text

    • workflow
      Alt text

    软件栈

    Alt text

    • 计算平面

    Alt text

    • 数据平面

    Alt text

    SW Challenges

    Graph Sampler

    For many small graphs datasets, full batch training works most time. Full batch training means we can do training on whole graph;
    When it comes to one large graph datasets, in many real scenarios, we meet Neighbor Explosion problem;

    Neighbor Explosion:
    Alt text

    Graph sampler comes to rescue. Only sample a fraction of target nodes, and furthermore, for each target node, we sample a sub-graph of its ego-network for training. This is called mini-batch training.
    Graph sampling is triggered for each data loading. And the hops of the sampled graph equals the GNN layer number 𝐿. Which means graph sampler in data loader is important in GNN training.

    Alt text
    Challenge: How to optimize sampler both as standalone and in training pipe?

    When graph comes to huge(billions of nodes, tens of billions of edges), we meet new at-scale challenges:

    • How to store the huge graph across node? -> graph partition

    • How to build a training system w/ not only distributed model computing but also distributed graph store and sampling?

      • How to cut the graph while minimize cross partition connections?

    Alt text

    A possible GNN distributed training architecture:

    Alt text

    Scatter-Gather

    • Fuse adjacent graphs ops

      One common fuse pattern for GCN & GraphSAGE:
      Alt text

      Challenge:
      How to fuse more GNN patterns on different ApplyEdge and ApplyVertex,automatically?

    • How to implement fused Aggregate
      Alt text
      Challenge:

      • Different graph data structures lead to different implementations in same logic operations;

      • Different graph characteristics favors different data structures;(like low-degree graphs favor COO, high-degree graphs favor CSR)

      • How to find the applicable zone for each and hide such complexity to data scientists?

    More

    • Inference challenge

      • GNN inference needs full batch inference, how to make it efficient?

      • Distributed inference for big graph?

      • Vector quantization for node and edge features?

      • GNN distilled to MLP?

    • SW-HW co-design challenge

      • How to relief irregular memory access in scatter-gather?

      • Do we need some data flow engine for acceleration?

    • …

    Finishing words

    “There is plenty of room at the top” 对技术人员很重要。但为避免入宝山而空返,我们更需要建立起技术架构,这就像是地图一样,只有按图索骥才能更好地探索和利用好top里的plenty of room。

    Alt text

    References

    1. Graph + AI: What’s Next? Progress in Democratizing Graph for All

    2. Recent Advances in Efficient and Scalable Graph Neural Networks

    3. Crossing the Chasm – Technology adoption lifecycle

    4. Understanding and Bridging the Gaps in Current GNN Performance Optimizations

    5. Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems

    6. Understanding GNN Computational Graph: A Coordinated Computation, IO, And Memory Perspective

    7. Graphiler: A Compiler For Graph Neural Networks

    8. Scatter-Add in Data Parallel Architectures

    9. fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU

    10. VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

    11. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs

    12. Completing a member knowledge graph with Graph Neural Networks

    13. PinnerFormer: Sequence Modeling for User Representation at Pinterest

    14. Gartner and Graph Analytics

  • 相关阅读:
    在VS里使用C#制作窗口应用
    问:未来5年的IT互联网行业,就业形势会是什么样的?
    【JAVA UI】HarmonyOS关系型数据增删改查
    基于JAVA文件发布系统计算机毕业设计源码+系统+mysql数据库+lw文档+部署
    APICloud开源O2O商城源码
    触控笔哪个牌子好用?主动电容笔和被动电容笔的区别
    《风向》——如何应对互联网变革下的知识焦虑不确定与个人成长
    国内较好的iPaaS供应商有哪些?
    Autojs 小游戏实践-潮玩宇宙开扭蛋
    Topaz Video AI:引领视频质量革命,让您的内容焕发新生
  • 原文地址:https://www.cnblogs.com/Matrix_Yao/p/16858882.html
  • 最新文章
  • 攻防演习之三天拿下官网站群
    数据安全治理学习——前期安全规划和安全管理体系建设
    企业安全 | 企业内一次钓鱼演练准备过程
    内网渗透测试 | Kerberos协议及其部分攻击手法
    0day的产生 | 不懂代码的"代码审计"
    安装scrcpy-client模块av模块异常,环境问题解决方案
    leetcode hot100【LeetCode 279. 完全平方数】java实现
    OpenWrt下安装Mosquitto
    AnatoMask论文汇总
    【AI日记】24.11.01 LangChain、openai api和github copilot
  • 热门文章
  • 十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!
    奉劝各位学弟学妹们,该打造你的技术影响力了!
    五年了,我在 CSDN 的两个一百万。
    Java俄罗斯方块,老程序员花了一个周末,连接中学年代!
    面试官都震惊,你这网络基础可以啊!
    你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法
    心情不好的时候,用 Python 画棵樱花树送给自己吧
    通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!
    13 万字 C 语言从入门到精通保姆级教程2021 年版
    10行代码集2000张美女图,Python爬虫120例,再上征途
Copyright © 2022 侵权请联系2656653265@qq.com    京ICP备2022015340号-1
正则表达式工具 cron表达式工具 密码生成工具

京公网安备 11010502049817号