• Using lxd to do vlan test (by quqi99)


    作者:张华 发表于:2022-08-15
    版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

    问题

    客户说sriov虚机里收不着arp reply, 他们的sriov虚机里是两个sriov网卡做一个ptk0 (bond ?), 由active NIC(pkt0_p)与standby NIC(pkt0_s)组成.

    /fa:16:3e:d8:3f:b9(pkt0)
    /fa:16:3e:d8:3f:b9(pkt0_p)
    /fa:16:3e:70:be:ba(pkt0_s)
    151.2.143.1/151.2.143.2/fa:16:3e:d8:3f:b9(pkt0.610@pkt0)
    10.139.99.1/10.139.99.2/fa:16:3e:d8:3f:b9(pkt0.510@pkt0)
    10.139.160.10/10.139.160.11/10.139.160.12/fa:16:3e:d8:3f:b9(pkt0.700@pkt0)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    他说在active NIC作ICMP的心跳检查没问题,但是在standby NIC上做ARP到GW的心跳检查收不着arp reply (但下列数据似乎收着啦?)

    1, arp for active port(fa:16:3e:d8:3f:b9)
    
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.src==fa:16:3e:d8:3f:b9 and arp |tail -n1
    357602 8141.824956 fa:16:3e:d8:3f:b9 → IETF-VRRP-VRID_64 ARP 60 Who has 10.139.160.254? Tell 10.139.160.10
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.dst==fa:16:3e:d8:3f:b9 and arp |tail -n1
    357603 8141.825416 IETF-VRRP-VRID_64 → fa:16:3e:d8:3f:b9 ARP 60 10.139.160.254 is at 00:00:5e:00:01:64
    
    2, icmp for active port(fa:16:3e:d8:3f:b9)
    
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.dst==fa:16:3e:d8:3f:b9 and icmp |tail -n1
    358835 8169.867056 10.139.160.254 → 10.139.160.10 ICMP 102 Echo (ping) reply    id=0x000a, seq=15233/33083, ttl=64 (request in 358834)
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.src==fa:16:3e:d8:3f:b9 and icmp |tail -n1
    358834 8169.863263 10.139.160.10 → 10.139.160.254 ICMP 102 Echo (ping) request  id=0x000a, seq=15233/33083, ttl=64
    
    3, arp for standby port(fa:16:3e:70:be:ba)
    
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.src==fa:16:3e:70:be:ba and arp |tail -n1
    358848 8170.244743 fa:16:3e:70:be:ba → Broadcast    ARP 60 Who has 10.139.160.254? (ARP Probe)
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.dst==fa:16:3e:70:be:ba and arp |tail -n1
    358849 8170.245117 IETF-VRRP-VRID_64 → fa:16:3e:70:be:ba ARP 60 10.139.160.254 is at 00:00:5e:00:01:64
    
    4, icmp for standby port(fa:16:3e:70:be:ba)
    
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.src==fa:16:3e:70:be:ba and icmp |tail -n1
    
    $ tshark -r ./EXT_TMP-700.pcap-1.act.pcap eth.dst==fa:16:3e:70:be:ba and icmp |tail -n1
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    已经做过如下分析:

    • 确认下列的sriov ovn配置中用于external network的br-data里没有使用sriov NIC, 如果这里是sriov NIC,并且sriov NIC没有使用直通,而是使用mapvtap的话,可能存在发卡模式的问题,即一个host上的VM不能访问本chassis的网络,但可以访问其他chassis的网络.
    juju config ovn-chassis-sriov-hugepages ovn-bridge-mappings
    dcfabric:br-data sriovfabric1:br-data sriovfabric2:br-data
    $ juju config ovn-chassis-sriov-hugepages bridge-interface-mappings
    br-data:bond1
    $ juju config ovn-chassis-sriov-hugepages sriov-device-mappings
    sriovfabric1:ens3f0 sriovfabric1:ens6f0 sriovfabric2:ens3f1 sriovfabric2:ens6f1
    $ juju config ovn-chassis-sriov-hugepages sriov-numvfs
    ens3f0:32 ens3f1:32 ens6f0:32 ens6f1:32
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 排除了lp bug 1875852, 客户没有使用vlan作为tenant network
    • 在PF上使用tcpdump只看到arp request是正常的.因为arp request是广播,那么在PF上能看到.但arp reply是单播,如果PF不是混杂模式(某些Intel sriov网卡有这个硬件bug不支持混杂模式)那么用PF上用tcpdump看不到arp reply是正常的.另外,在VF上是无法使用tcpdump的.
    • DHCP是禁用的.一般说来使用sr-iov ovn应该将sriov subnet打开dhcp. 但这里是禁用的,应该也没问题,因为客户会静态指定IP
    • 客户静态指定IP(由heat指定)与nova里分配的IP不一样,应该也不影响.因为sriov会bypass host,host上的SG不会影响它(主要是IP/MAC防欺骗的SW rule)
    • 实际IP与nova分配的IP不同,openstack应用层面的SG是不会影响到它,那sriov硬件层面的SG呢?确认spoof checking 也是off的.
    i$ grep -E 'fa:16:3e:f8:42:fe|fa:16:3e:70:be:ba|fa:16:3e:8f:56:5a|fa:16:3e:d8:3f:b9' sos_commands/networking/ip_-s_-d_link
    vf 30 MAC fa:16:3e:70:be:ba, spoof checking off, link-state auto, trust on
    vf 31 MAC fa:16:3e:f8:42:fe, spoof checking off, link-state auto, trust on
    vf 29 MAC fa:16:3e:8f:56:5a, spoof checking off, link-state auto, trust on
    vf 30 MAC fa:16:3e:d8:3f:b9, spoof checking off, link-state auto, trust on
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • mac filting排除了(above spoof checking), 那vlan filting的问题呢?tcpdump数据显示客户似乎在虚机内部定义了一个vlan(pkt0.700@pkt0)

    我们这篇文章的测试主要就是模拟这个vlan测试,当然这里不涉及sriov硬件.

    vlan实验环境搭建

    lxc remote add faster https://mirrors.tuna.tsinghua.edu.cn/lxc-images/ --protocol=simplestreams --public
    lxc image list faster:
    lxc remote list
    #Failed creating instance record: Failed detecting root disk device: No root device could be found
    #lxc profile device add default root disk path=/ pool=default
    #lxc profile show default
    #lxc launch ubuntu:focal master -p juju-default --config=user.network-config="$(cat network.yml)"
    lxc launch faster:ubuntu/jammy test1
    lxc launch faster:ubuntu/jammy test2
    
    #add two NICs from NET1 for two containers
    lxc network create NET1 ipv6.address=none ipv4.address=10.139.160.1/24
    lxc network attach NET1 test1 eth1
    lxc network attach NET1 test1 eth2
    lxc network attach NET1 test2 eth1
    lxc network attach NET1 test2 eth2
    
    #https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking#vlan
    #ip link add ptk0 type bond miimon 100 mode active-backup
    #ip link set eth2 master ptk0
    #ip link set eth1 master ptk0
    lxc exec test1 -- /bin/bash
    cat << EOF |tee /etc/netplan/11-test.yaml
    network:
      version: 2
      renderer: networkd
      ethernets:
        eth1:
          addresses: []
          dhcp4: false
          dhcp6: false
          macaddress: 00:16:3e:15:bd:58
        eth2:
          addresses: []
          dhcp4: false
          dhcp6: false
          macaddress: 00:16:3e:68:72:0f
      bonds:
        ptk0:
          addresses: []
          dhcp4: false
          dhcp6: false
          interfaces:
            - eth1
            - eth2
          parameters:
            mode: active-backup
            primary: eth1
      vlans:
        ptk0.700:
          id: 700
          link: ptk0
          dhcp4: no
          addresses: [ 10.139.160.10/24 ]
          nameservers:
            search: [ domain.local ]
            addresses: [ 8.8.8.8 ]
    EOF
    netplan apply
    
    lxc exec test2 -- /bin/bash
    cat << EOF |tee /etc/netplan/11-test.yaml
    network:
      version: 2
      renderer: networkd
      ethernets:
        eth1:
          addresses: []
          dhcp4: false
          dhcp6: false
          macaddress: 00:16:3e:1e:19:25
        eth2:
          addresses: []
          dhcp4: false
          dhcp6: false
          macaddress: 00:16:3e:f7:9e:22
      bonds:
        ptk0:
          addresses: []
          dhcp4: false
          dhcp6: false
          interfaces:
            - eth1
            - eth2
          parameters:
            mode: active-backup
            primary: eth1
      vlans:
        ptk0.700:
          id: 700
          link: ptk0
          dhcp4: no
          addresses: [ 10.139.160.11/24 ]
          nameservers:
            search: [ domain.local ]
            addresses: [ 8.8.8.8 ]
    EOF
    netplan apply
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98

    上面创建了两个lxd,并在两个lxd中创建了active/standby的bond (ptk0), 然后创建了一个vlan (ptk0.700), 要想上面的网络通,还得在host里设置trunk, 这样vlan网络就通了.
    注意:上面需要使用macaddress为两个NIC来设置mac, 若不设置,在创建bond和vlan后会出现有所NIC的mac相同的情况.

    $ sudo brctl show |grep NET1 -A3
    NET1		8000.00163eeb79c4	no		veth2af34c1d
    							veth3a5b458e
    							veth82c292b2
    							veth9b8e8cb6
    #sudo bridge vlan add vid 2-4094 dev NET1 self
    sudo bridge vlan add vid 700 dev NET1 self
    sudo bridge vlan add vid 700 dev veth2af34c1d
    sudo bridge vlan add vid 700 dev veth3a5b458e
    sudo bridge vlan add vid 700 dev veth82c292b2
    sudo bridge vlan add vid 700 dev veth9b8e8cb6
    sudo bridge vlan show
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    此时,test1可以通过vlan700来ping test2

    root@test1:~# ping 10.139.160.11 -c1
    PING 10.139.160.11 (10.139.160.11) 56(84) bytes of data.
    64 bytes from 10.139.160.11: icmp_seq=1 ttl=64 time=0.133 ms
    root@test2:~# tcpdump -i eth1 -nn -e -l
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
    05:54:36.128602 00:16:3e:15:bd:58 > 00:16:3e:1e:19:25, ethertype 802.1Q (0x8100), length 102: vlan 700, p 0, ethertype IPv4 (0x0800), 10.139.160.10 > 10.139.160.11: ICMP echo request, id 37135, seq 1, length 64
    05:54:36.128643 00:16:3e:1e:19:25 > 00:16:3e:15:bd:58, ethertype 802.1Q (0x8100), length 102: vlan 700, p 0, ethertype IPv4 (0x0800), 10.139.160.11 > 10.139.160.10: ICMP echo reply, id 37135, seq 1, length 64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    但是仍然无法ping GW的

    root@test1:~# ping 10.139.160.1 -c1
    PING 10.139.160.1 (10.139.160.1) 56(84) bytes of data.
    From 10.139.160.10 icmp_seq=1 Destination Host Unreachable
    $ sudo tcpdump -i NET1 -nn -e -l
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on NET1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
    14:25:24.761131 00:16:3e:15:bd:58 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 700, p 0, ethertype ARP (0x0806), Request who-has 10.139.160.1 tell 10.139.160.10, length 28
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    无论是创建一个eth0.700, 还是创建一个vlan=700的tap0,均无法ping

    #use eth0.700
    sudo ip link add link eth0 name eth0.700 type vlan id 700
    sudo brctl addif NET1 eth0.700
    sudo ifconfig eth0.700 up
    sudo ip addr add 10.139.160.254/24 dev eth0.700
    sudo bridge vlan add vid 700 dev eth0.700
    
    #use a tap
    sudo ip tuntap add mode tap tap0
    sudo ip link set tap0 master NET1
    sudo bridge vlan add dev tap0 vid 700 pvid untagged master
    sudo ip addr add 10.139.160.254/24 dev tap0
    sudo bridge vlan show
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    测试1

    那就将test2当成gw吧,然后我们从test1上ping它然后抓包
    如果仅从active port使用icmp

    root@test1:~# ping -I eth1 10.139.160.1 -c1
    ping: Warning: source address might be selected on device other than: eth1
    PING 10.139.160.1 (10.139.160.1) from 192.168.121.88 eth1: 56(84) bytes of data.
    ^C
    --- 10.139.160.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms
    
    $ sudo tcpdump -i NET1 -nn -e -l
    14:32:04.483156 00:16:3e:15:bd:58 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.139.160.1 tell 192.168.121.88, length 28
    14:32:04.483185 00:16:3e:eb:79:c4 > 00:16:3e:15:bd:58, ethertype ARP (0x0806), length 42: Reply 10.139.160.1 is-at 00:16:3e:eb:79:c4, length 28
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    运行’ping -I eth1 10.139.160.11 -c1’与'ping -I eth2 10.139.160.11 -c1’均无输出

    测试2

    使用arping命令发送arp request时必须指定一个IP, 但standby port上又没有IP,所以通过’-S’指定了一个.

    root@test1:~# arping -I ptk0.700 10.139.160.11 -S 10.139.160.2 -C1
    ARPING 10.139.160.11
    42 bytes from 00:16:3e:1e:19:25 (10.139.160.11): index=0 time=8.119 usec
    root@test2:~# sudo tcpdump -i ptk0.700 -nn -e -l
    09:08:16.814374 00:16:3e:15:bd:58 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 58: Request who-has 10.139.160.11 tell 10.139.160.2, length 44
    09:08:16.814410 00:16:3e:1e:19:25 > 00:16:3e:15:bd:58, ethertype ARP (0x0806), length 42: Reply 10.139.160.11 is-at 00:16:3e:1e:19:25, length 28
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    运行’arping -I eth1 10.139.160.11 -S 10.139.160.2 -C1’与’arping -I eth2 10.139.160.11 -S 10.139.160.2 -C1’均无输出

    root@test1:~# arping -I eth2 10.139.160.11 -S 10.139.160.2 -C1
    ARPING 10.139.160.11
    Timeout
    
    • 1
    • 2
    • 3

    那是因为eth1与eth2不是vlan=700?

    Some Outputs

    root@test1:~# cat /proc/net/bonding/ptk0 
    Ethernet Channel Bonding Driver: v5.15.0-43-generic
    
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: eth1 (primary_reselect always)
    Currently Active Slave: eth1
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    Peer Notification Delay (ms): 0
    
    Slave Interface: eth1
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 00:16:3e:15:bd:58
    Slave queue ID: 0
    
    Slave Interface: eth2
    MII Status: up
    Speed: 10000 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 00:16:3e:68:72:0f
    Slave queue ID: 0
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    另一种纯CLI方法

    上面的不使用netplan还设置网络,而是直接使用纯CLI命令来创建bond, 并且不采用vlan-filtering的方法- https://developers.redhat.com/blog/2017/09/14/vlan-filter-support-on-bridge#bridge_and_vlan

    lxc launch faster:ubuntu/jammy test1
    lxc launch faster:ubuntu/jammy test2
    #add two NICs from NET1 for two containers
    lxc network create NET1 ipv6.address=none ipv4.address=10.139.160.1/24
    lxc network attach NET1 test1 eth1
    lxc network attach NET1 test1 eth2
    lxc network attach NET1 test2 eth1
    lxc network attach NET1 test2 eth2
    
    #inside test1
    lxc exec test1 -- /bin/bash
    sudo ip link add ptk0 type bond miimon 100 mode active-backup
    sudo ip link set eth1 down
    sudo ip link set eth1 master ptk0
    sudo ip link set eth2 down
    sudo ip link set eth2 master ptk0
    sudo ip link set dev ptk0 address 00:16:3e:15:bd:58
    sudo ip link set dev eth1 address 00:16:3e:15:bd:58
    sudo ip link set dev eth2 address 00:16:3e:68:72:0f
    sudo ip link set ptk0 up
    sudo ip link add link ptk0 name ptk0.700 type vlan id 700
    sudo ip addr add 10.139.160.10/24 dev ptk0.700
    
    #inside test2
    lxc exec test2 -- /bin/bash
    sudo ip link add ptk0 type bond miimon 100 mode active-backup
    sudo ip link set eth1 down
    sudo ip link set eth1 master ptk0
    sudo ip link set eth2 down
    sudo ip link set eth2 master ptk0
    sudo ip link set dev ptk0 address 00:16:3e:1e:19:25
    sudo ip link set dev eth1 address 00:16:3e:1e:19:25
    sudo ip link set dev eth2 address 00:16:3e:f7:9e:22
    sudo ip link set ptk0 up
    sudo ip link add link ptk0 name ptk0.700 type vlan id 700
    sudo ip addr add 10.139.160.11/24 dev ptk0.700
    
    #on host
    sudo bridge vlan add vid 700 dev NET1 self
    brctl show NET1 |grep veth |xargs -i sudo bridge vlan add vid 700 dev {}
    sudo bridge vlan show
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    reference

    [1] LACP Bond配置 - https://blog.csdn.net/quqi99/article/details/51251210
    [2] 三种方式使用vlan - https://blog.csdn.net/quqi99/article/details/51218884
    [3] creating vlan over openstack - https://blog.csdn.net/quqi99/article/details/118341936
    [4] VLAN filter support on bridge - https://developers.redhat.com/blog/2017/09/14/vlan-filter-support-on-bridge#

  • 相关阅读:
    Docker—概述与安装
    kali安装nodejs、npm失败
    对Docker基础镜像的思考,该不该选择alpine
    ARMv7/ARMv8/ARMv9架构你不知道的那些事
    面试算法22:链表中环的入口节点(2)
    使用 Amazon Bedrock 和 RAG 构建 Text2SQL 行业数据查询助手
    校园交友|基于SprinBoot+vue的校园交友网站(源码+数据库+文档)
    C++继承(二)多继承,菱形继承,继承中同名成员问题的解决,虚继承。虚基类表和虚基类表指针。
    浅谈MySQL执行计划Explain
    拉格朗日中值定理推论及用法
  • 原文地址:https://blog.csdn.net/quqi99/article/details/126345528