• iRDMA Flow Control Introduction


    1.0       Introduction

    This will introduce Ethernet flow control on Intel® Ethernet 800 Series Network Adapters with RDMA driver - iRDMA, with a focus on best practices for Linux RDMA traffic.

    It includes:

    • Background on Ethernet flow control (FC) and Data Center Bridging (DCB).
    • Differences between Link-level Flow Control (LFC) and Priority Flow Control (PFC).
    • Configuration steps for each type on 800 Series Linux hosts.
    • Verification tips.

    1.1         QoS/Flow Control Limitations on the 800 Series

    • Although the 800 Series hardware supports eight Traffic Classes (TCs), the maximum supported configuration is four TCs per port. Only one TC can have Priority Flow Control enabled per port.

    Number of Adapter Ports

    Traffic Class Recommendation

    RDMA

    1, 2, or 4

    Up to four TCs, with one of them enabled with PFC.

    Supported

    More than 4

    No DCB Support

    Not Supported

    • In RoCEv2 mode, if no flow control is detected (either LFC or PFC), the driver automatically de-tunes. This is an intentional design to allow RoCEv2 to operate without flow control, but with lower performance.
    • When the 800 Series is in firmware Link Layer Discovery Protocol (LLDP) mode, only three application priorities are supported. Software LLDP supports 32. This refers to the LLDP APP TLV - see man lldptool-app for more info.


       

    2.0         Background

                                                                                                       

    2.1           Ethernet Flow Control

    By design, Ethernet is an unreliable protocol with no guarantee that packets arrive at their destination correctly and in order. Instead, Ethernet relies on upper-layer protocols (such as TCP) or applications to provide reliable service and error correction.

    The 802.3x standard introduced flow control to the Ethernet protocol, defining a mechanism for throttling the flow of data between two directly connected full-duplex network devices. If the sender transmits data faster than the receiver can accept it, the overwhelmed receiver can send a pause signal (Xoff or transmit off) to the sender, requesting that the sender stop transmitting data for a specified period of time. The sender resumes transmission either after the timeout period expires or if the receiver indicates that it is ready to accept more data by sending an Xon (transmit on) signal.

    Without flow control, data might be lost or need to be re-transmitted by a ULP or application, which can significantly affect performance.

    2.2           Flow Control in RDMA Networks

    The 800 Series supports both iWARP and RoCEv2 RDMA transports. Flow control is strongly recommended for RoCEv2, but iWARP also benefits.

    Base Transport

    Flow Control Requirements

    iWARP TCP

    iWARP runs over TCP, a reliable protocol that implements its own flow control.

    TCP's flow control might be relatively slow to respond in a high-performance, low-latency RDMA environment, especially under bursty traffic patterns.

    Ethernet flow control is optional, but can be beneficial for iWARP.

    iWARP mode requires VLAN to be configured fully to enable PFC.

    RoCEv2 UDP

    RoCEv2 runs over UDP, an unreliable protocol with no built-in flow control.

    RoCEv2 therefore requires a lossless Ethernet network to ensure packet delivery.

    If the irdma driver is in RoCEv2 mode and detects no flow control, it automatically de-tunes, causing lower performance.

    Flow control is always recommended for RoCEv2.

    2.3          Types of Flow Control: LFC vs. PFC

    Ethernet standards define two types of flow control:

    • Link-level Flow Control (LFC)
    • Priority Flow Control (PFC)

    Both types use Xon/Xoff pause frames to control data transmission. The primary difference is that LFC pauses all traffic on a link, but PFC supports Quality-of-Service (QoS) by defining different traffic priorities that can be indiv

  • 相关阅读:
    条件控制
    如何在前端开发中实现摄像头拍照和人像定位
    Go:基于BDD的测试框架 Ginkgo 简介及实践
    C++-Cmake指令:find_package【用于查找包(通常是使用三方库)】
    前端,样式,行间距,字间距
    轻松应对80% 的工作场景?GitHub 爆赞的 Java 高并发与集合框架,面试官也拿我没辙
    Idea创建SpringBoot多模块项目
    vue的生命周期
    电脑显示屏哪些材料需要做防火测试?做哪些测试?
    AQS源码解析 7.共享模式_CyclicBarrier重复屏障
  • 原文地址:https://blog.csdn.net/mounter625/article/details/134553500