• Docker桥接网络分析


    前言

    虚拟局域网(VLAN)》一文中描述了虚拟网卡、虚拟网桥的作用,以及通过iptables实现了vlan联网,其实学习到这里自然就会联想到目前主流的容器技术:Docker,因此接下来打算研究一下Docker的桥接网络与此有何异同。

    猜测

    众所周知,Docker有host、bridge、none三种网络模式,这里我们仅分析桥接(bridge)模式。有了上一篇文章的基础,bridge这个概念我们应该已经熟悉了,bridge网桥是一种基于mac地址数据链路层进行数据交换的一个虚拟交换机

    所以我们现在可以大胆的进行猜测:Docker也是基于此模式实现了内部网络通信。

    • 猜测一:Docker引擎在创建容器的时候会自动为容器创建一对虚拟网卡(veth)并为其分配私有ip,然后将veth一端连接在docker0网桥中,另一端连接在容器的内部网络中
    • 猜测二:Docker同样利用iptables的nat能力将容器内流量转发至互联网实现通信。

    求证

    检查主机网卡列表

    检查docker容器及网卡列表,观察是否存在docker网桥以及veth。

    shell
    # 查看本机正在运行的coekr容器(mysql、redis、halo、debian)
    [root@VM-8-10-centos ~]# docker ps 
    CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                                                  NAMES
    56ffaf39316a   debian               "bash"                   23 hours ago    Up 7 minutes                                                          debian
    c8a273ce122e   halohub/halo:1.5.3   "/bin/sh -c 'java -X…"   5 months ago    Up 47 hours    0.0.0.0:8090->8090/tcp, :::8090->8090/tcp              halo
    d09fcfa7de0f   redis                "docker-entrypoint.s…"   12 months ago   Up 5 weeks     0.0.0.0:8805->6379/tcp, :::8805->6379/tcp              redis
    87a2192f6db4   mysql:5.7            "docker-entrypoint.s…"   2 years ago     Up 5 weeks     0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql
    # 检查主机网卡列表(确认docker0、veth存在)
    [root@VM-12-15-centos ~]# ip link 
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 52:54:00:b3:6f:20 brd ff:ff:ff:ff:ff:ff
        altname enp0s5
        altname ens5
    3: br-67cf5bfe7a5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
        link/ether 02:42:c5:07:22:c7 brd ff:ff:ff:ff:ff:ff
    4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
        link/ether 02:42:38:d6:1b:ea brd ff:ff:ff:ff:ff:ff
    5: br-9fd151a807e7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
        link/ether 02:42:35:7f:ed:76 brd ff:ff:ff:ff:ff:ff
    315: vethf2afb37@if314: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 3a:06:f0:8d:06:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 12
    317: veth1ec30f9@if316: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-9fd151a807e7 state UP mode DEFAULT group default 
        link/ether 4a:ad:1a:b0:5a:5f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    319: vethc408286@if318: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 26:b0:3c:f4:c5:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
    321: veth68fb8c6@if320: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 96:ca:a9:42:f8:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 9
    323: veth6dba394@if322: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 92:1c:5e:9c:a2:b3 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    325: veth1509ed0@if324: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether fa:22:33:da:12:e0 brd ff:ff:ff:ff:ff:ff link-netnsid 11
    329: vethef1dbac@if328: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether aa:db:d2:10:36:60 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    331: veth69d3e7d@if330: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 86:45:d0:0e:6b:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    335: veth98588ae@if334: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether 86:59:55:39:17:ad brd ff:ff:ff:ff:ff:ff link-netnsid 7
    349: vetha84d717@if348: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default 
        link/ether ee:7f:d2:27:15:83 brd ff:ff:ff:ff:ff:ff link-netnsid 6
    354: veth1@if355: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mybridge state UP mode DEFAULT group default qlen 1000
        link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff link-netns n1
    356: br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff
    

    使用ip link查看本机网卡列表,可以发现宿主机存在一个名为docker0的虚拟网桥,且虚拟网桥下有四对虚拟网卡分别对应 debian、halo、redis、mysql四个docker容器

    检查网桥ip及Docker内部容器的网络通信

    shell
    # docker0默认网桥的IP地址为172.17.0.1/16
    [root@VM-8-10-centos ~]# ip addr show docker0
    3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
        link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.1/16 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:6fff:fed7:197e/64 scope link 
           valid_lft forever preferred_lft forever
    # 检查桥接网络内部容器的ip地址(分别为172.17.0.2/16、172.17.0.3/16、172.17.0.4/16、172.17.0.5/16)       
    [root@VM-8-10-centos ~]# docker network inspect bridge
    [
        {
            "Name": "bridge",
            "Id": "2dc75e446719be8cad37e1ea9ae7d1385fcc728b8177646a3c62929c2b289e94",
            "Created": "2024-04-24T09:46:14.399901891+08:00",
            "Scope": "local",
            "Driver": "bridge",
            "EnableIPv6": false,
            "IPAM": {
                "Driver": "default",
                "Options": null,
                "Config": [
                    {
                        "Subnet": "172.17.0.0/16",
                        "Gateway": "172.17.0.1"
                    }
                ]
            },
            "Internal": false,
            "Attachable": false,
            "Ingress": false,
            "ConfigFrom": {
                "Network": ""
            },
            "ConfigOnly": false,
            "Containers": {
                "56ffaf39316ac9f776c6b3e2a8a79e9f42dfab42aa1f7de7525bd26c686defaa": {
                    "Name": "debian",
                    "EndpointID": "47dd9441d4a4c8b09afea3bca23652b80ba35e6baa13d44ec21ec89522e722a6",
                    "MacAddress": "02:42:ac:11:00:05",
                    "IPv4Address": "172.17.0.5/16",
                    "IPv6Address": ""
                },
                "87a2192f6db48c9bf2996bf25c79d4c18c3ae2975cac9d55e7fdfdcec03f896b": {
                    "Name": "mysql",
                    "EndpointID": "00b93de23c5abf2ed1349bac1c2ec93bf7ed516370dabf23348b980f19cfaa9c",
                    "MacAddress": "02:42:ac:11:00:02",
                    "IPv4Address": "172.17.0.2/16",
                    "IPv6Address": ""
                },
                "c8a273ce122ef5479583908f40898141a90933a3c41c8028dc7966b9af4c465d": {
                    "Name": "halo",
                    "EndpointID": "ba8ef83c80f3edb6e7987c95ae6d56816a1fc00d07e8bb2bfbb0f19ef543badf",
                    "MacAddress": "02:42:ac:11:00:04",
                    "IPv4Address": "172.17.0.4/16",
                    "IPv6Address": ""
                },
                "d09fcfa7de0f2a7b3ef7927a7e53a8a53fb93021b119b1376fe4616381c5a57c": {
                    "Name": "redis",
                    "EndpointID": "afbc9128f7d27becfbf64e843a92d36ce23800cd42c131e550abea7afb6a131e",
                    "MacAddress": "02:42:ac:11:00:03",
                    "IPv4Address": "172.17.0.3/16",
                    "IPv6Address": ""
                }
            },
            "Options": {
                "com.docker.network.bridge.default_bridge": "true",
                "com.docker.network.bridge.enable_icc": "true",
                "com.docker.network.bridge.enable_ip_masquerade": "true",
                "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
                "com.docker.network.bridge.name": "docker0",
                "com.docker.network.driver.mtu": "1500"
            },
            "Labels": {}
        }
    ] 
    # 进入debian容器测试内部网络通信和互联网通信
    [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
    root@56ffaf39316a:/# ping 172.17.0.1
    PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
    64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.071 ms
    64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.036 ms
    --- 172.17.0.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 999ms
    rtt min/avg/max/mdev = 0.036/0.053/0.071/0.017 ms
    root@56ffaf39316a:/# ping 172.17.0.3
    PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
    64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.067 ms
    64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.047 ms
    --- 172.17.0.3 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1000ms
    rtt min/avg/max/mdev = 0.047/0.057/0.067/0.010 ms
    root@56ffaf39316a:/# ping baidu.com
    PING baidu.com (39.156.66.10) 56(84) bytes of data.
    64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=59.0 ms
    64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
    --- baidu.com ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1001ms
    rtt min/avg/max/mdev = 55.400/57.221/59.043/1.821 ms
    
    小结

    通过shell的结果分析:docker0网桥的ip为172.17.0.1/16,docker0各子网通信正常,并且通过ping baidu.com检查了互联网通信也正常。因此可以得出docker桥接模式与前一章中vlan模式是一致的,都是通过一个虚拟网桥实现了内部网络的通信

    docker内部通信脉络图

    Docker容器与互联网进行通信

    在上一章节中不小心留了个坑,因为firewalld在iptables中内置了很多的规则,所以对于流量的分析很不友好,所以我索性直接关闭了firewalld,但是紧接着就发现这样做有一个副作用:firewalld关闭后,iptables也会被清空。当时不觉得有什么影响,现在仔细回想了一下vlan之所以能够连接互联网,很大一部分原因是利用了iptables的nat功能,iptables被清空,意味着nat功能被关闭了,所以利用此功能的应用会失去网络连接。下面使用shell命令来模拟并分析此现象。

    shell
    # 关闭firewalld
    [root@VM-8-10-centos ~]# systemctl stop firewalld
    # 检查iptables
    [root@VM-8-10-centos ~]# iptables -nvL
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination
    [root@VM-8-10-centos ~]# iptables -t nat -nvL
    Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination 
    # 检查debian容器互联网连接情况 
    [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
    root@56ffaf39316a:/# ping baidu.com
    PING baidu.com (110.242.68.66) 56(84) bytes of data.
    --- baidu.com ping statistics ---
    5 packets transmitted, 0 received, 100% packet loss, time 4000ms
    # 检查内部网络连接情况
    root@56ffaf39316a:/# ping 172.17.0.1
    PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
    64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.041 ms
    64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.046 ms
    --- 172.17.0.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1000ms
    rtt min/avg/max/mdev = 0.041/0.043/0.046/0.002 ms
    root@56ffaf39316a:/# ping 172.17.0.2
    PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
    64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.081 ms
    64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.055 ms
    --- 172.17.0.2 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1001ms
    rtt min/avg/max/mdev = 0.055/0.068/0.081/0.013 ms
    

    通过清空iptables发现docker容器内部确实丢失了互联网连接,但是没有影响内部网络的通信。

    手动添加nat记录恢复Docker容器与互联网的通信
    # 添加snat记录
    [root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
    # 检查debian容器互联网连接情况 
    [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
    root@56ffaf39316a:/# ping baidu.com
    PING baidu.com (39.156.66.10) 56(84) bytes of data.
    64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=55.8 ms
    64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
    --- baidu.com ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 55.386/55.610/55.834/0.224 ms
    

    个人总结: docker容器与互联网进行通信时确实依赖iptables,且行为上与vlan几乎一致,因此我认为Docker其实是vlan+iptables一种高级应用。
    docker容器互联网通信脉络图

    思考

    docker容器内的网络通信是否也基于二层协议进行数据交换?

    基于之前对vlan的了解,明白了bridge是一种工作在"数据链路层",根据mac地址交换数据帧的虚拟交换机,既然工作在二层,那么意味着它在进行数据交换时是没有ip概念的,仅仅是按照mac地址转发数据帧。既然如此,那么即使删除了它的ip地址和路由表,应该也可以完成数据交换。

    [root@VM-8-10-centos ~]# ip addr del 172.17.0.1/16 dev docker0
    [root@VM-8-10-centos ~]# ip addr show docker0
    3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
        link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
        inet6 fe80::42:6fff:fed7:197e/64 scope link 
           valid_lft forever preferred_lft forever
    [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
    root@56ffaf39316a:/# ping 172.17.0.3 
    PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
    64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.066 ms
    64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.049 ms
    --- 172.17.0.3 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1000ms
    rtt min/avg/max/mdev = 0.049/0.057/0.066/0.008 ms
    

    iptables与路由表有何联系和区别?谁决定了流量的出口网卡?

    学习vlan的时候就存在一个疑惑:**虚拟网桥进行互联网通信时,将流入网桥的流量转发到出口网卡是由谁决定的?**当时做vlan的nat通信时,因为需要在iptables中配置FORWARD及NAT规则,自然而然的会认为是iptables实现的。如此的话,那么路由表存在的意义又是什么呢?**所以到底是iptables实现了流量转发,还是路由表(ip route)实现了流量转发?**或者具体点讲:是谁将流量从docker0网卡转发到eth0网卡?

    具体过程需要深入分析iptables的工作原理,这里就不再赘述了,直接给出个人结论仅供参考。

    个人结论:路由表不对流量做任何更改,仅仅用来确定数据包的出口网卡,iptables可以对ip数据包进行过滤、修改、转发,但最终还是由路由表确定出口网卡。

    即使没有snat,数据包是不是应该也可以到达对方网络?

    在互联网中基于ip协议进行通信的流量都会被标注源地址目的地址,目的地址决定了流量应该如何发送给对方主机,源地址决定了其他主机如何区分数据包是由谁发送的。而SNAT的核心概念是通过转换源地址的方式进行工作的,这是否意味着即使不配置snat,数据包依然可以到达对方网络,只是对方网络无法回复。

    # 假设我有两台具有公网ipv4地址的云服务器xxx.xxx.xxx.xx1和xxx.xxx.xxx.xx2。xx1局域网内有另一台主机x10
    
    # xx1主机
    # 使用snat将源ip由xx1转换为xx2
    [root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -s xx1 -o eth0 -j SNAT --to-source xxx.xxx.xxx.x10
    # 监听eth0网卡的icmp数据包
    [root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv | grep x10
    
    # xx2主机
    # 监听eth0网卡的icmp数据包
    [root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv 	
    

    根据tcpdump抓包分析xx1确实发送了源地址为x10的数据包,但是从xx2主机的监听结果看并没有收到来自xx1或来自x10发送的数据包。或许是数据包在中途路由的过程中被丢弃了,又或者是我理解错了??

  • 相关阅读:
    大模型携手AI原生应用融入产业场景
    C++ 多线程使用
    Deepin
    稀土铕掺杂氧化铝发光微球/稀土元素荧光微球偶联抗体冻干粉的应用制备
    作为一个测试工程师,爆火的“养了个羊”你知道哪些Bug吗?来看这里~
    LeetCode(力扣)491. 递增子序列Python
    【学习教程】遥感数据与DSSAT作物生长模型同化及在作物长势监测与估产中的应用
    程序员 40 岁之后的出路在哪里?
    笔试强训-day01_T1 BC153 [NOIP2010]数字统计
    谈谈ORACLE应用数据同步器:Primavera Gateway
  • 原文地址:https://blog.csdn.net/qq_39914581/article/details/139450123