• docker容器之网络模型动手实验篇一


    容器的本质

    容器的本质就是一个进程,只不过对它进行了Linux Namesapce隔离,让它看不到外面的世界,用Cgroups限制了它能使用的资源,同时利用系统调用pivot_rootchroot切换了进程的根目录,把容器镜像挂载为根文件系统rootfsrootfs中不仅有要运行的应用程序,还包含了应用的所有依赖库,以及操作系统的目录和文件。rootfs打包了应用运行的完整环境,这样就保证了在开发、测试、线上等多个场景的一致性。

    看清了容器的本质,很多问题就容易理解。例如我们执行 docker exec 命令能够进入运行中的容器,好像登录进独立的虚拟机一样。实际上这只不过是利用系统调用setns,让当前进程进入到容器进程的Namesapce中,它就能“看到”容器内部的情况了。

    容器的网络

    如何让容器之间互相连接保持网络通畅,Docker有多种网络模型,最常见的有:

    • 单机环境:bridge

    • 扩主机:overlay

    本篇章主要通过动手实验,理解birdge模型。

    基本概念

    Veth Pairs:Veth是成对出现的两张虚拟网卡,从一端发送的数据包,总会在另一端接收到。利用Veth的特性,我们可以将一端的虚拟网卡"放入"容器内,另一端接入虚拟交换机。这样,接入同一个虚拟交换机的容器之间就实现了网络互通。

    Linux Bridge:交换机是工作在数据链路层的网络设备,它转发的是二层网络包。最简单的转发策略是将到达交换机输入端口的报文,广播到所有的输出端口。当然更好的策略是在转发过程中进行学习,记录交换机端口和MAC地址的映射关系,这样在下次转发时就能够根据报文中的MAC地址,发送到对应的输出端口。

    我们可以认为Linux bridge就是虚拟交换机,连接在同一个bridge上的容器组成局域网,不同的bridge之间网络是隔离的。 docker network create [NETWORK NAME]实际上就是创建出虚拟交换机。

    iptables:容器需要能够访问外部世界,同时也可以暴露服务让外界访问,这时就要用到iptables。另外,不同bridge之间的隔离也会用到iptables。

    我们说的iptables包含了用户态的配置工具(/sbin/iptables)和内核netfilter模块,通过使用iptables命令对内核的netfilter模块做规则配置。

    实验部分

    实验拓扑:

    Image description

    实验一:单主机容器间网络互通

    实验步骤:

    1、创建容器(network namespace)

    1. [root@master1 ~]# ip netns add docker1
    2. [root@master1 ~]# ip netns add docker2

    查看确认:

    1. [root@master1 ~]# ip netns ls
    2. docker2
    3. docker1
    4. [root@master1 ~]# ls /var/run/netns/ -l
    5. -r--r--r-- 1 root root 0 May 24 15:28 docker1
    6. -r--r--r-- 1 root root 0 May 24 15:28 docker2

    2、创建veth 设备对

    1. [root@master1 ~]# ip netns add docker1
    2. [root@master1 ~]# ip netns add docker2

    查看确认:

    1. [root@master1 ~]# ip link show type veth
    2. 25: veth10@veth11: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    3. link/ether 76:d1:d2:22:00:77 brd ff:ff:ff:ff:ff:ff
    4. 26: veth11@veth10: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    5. link/ether 92:4f:f5:bb:36:76 brd ff:ff:ff:ff:ff:ff
    6. 27: veth20@veth21: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    7. link/ether ba:d0:9d:df:4c:62 brd ff:ff:ff:ff:ff:ff
    8. 28: veth21@veth20: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    9. link/ether 56:96:e1:88:88:69 brd ff:ff:ff:ff:ff:ff

    3、将veth 一端放入容器

    1. [root@master1 ~]# ip link set veth11 netns docker1
    2. [root@master1 ~]# ip link set veth21 netns docker2

    查看确认

    1. [root@master1 ~]# ip netns exec docker1 ip link
    2. 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    3. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    4. 26: veth11@if25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    5. link/ether 92:4f:f5:bb:36:76 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    6. [root@master1 ~]# ip netns exec docker2 ip link
    7. 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    8. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    9. 28: veth21@if27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    10. link/ether 56:96:e1:88:88:69 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    4、创建bridge设备

    [root@master1 ~]# ip link add bridge1 type bridge

    也可以使用brctl 工具。

    查看确认:

    1. [root@master1 ~]# ip link show type bridge
    2. 29: bridge1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    3. link/ether f2:10:a7:ae:a1:53 brd ff:ff:ff:ff:ff:ff

    5、将veth 另一端放入bridge1

    1. [root@master1 ~]# ip link set veth10 master bridge1
    2. [root@master1 ~]# ip link set veth20 master bridge1

    查看确认:

    1. [root@master1 ~]# ip link show type veth |grep master
    2. 25: veth10@if26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bridge1 state DOWN mode DEFAULT group default qlen 1000
    3. 27: veth20@if28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bridge1 state DOWN mode DEFAULT group default qlen 1000
    4. [root@master1 ~]# brctl show bridge1
    5. bridge name bridge id STP enabled interfaces
    6. bridge1 8000.76d1d2220077 no veth10
    7. veth20

    6、为容器内的网卡分配ip地址并激活

    1. [root@master1 ~]# ip netns exec docker1 ip addr add 172.30.0.100/24 dev veth11
    2. [root@master1 ~]# ip netns exec docker1 ip link set dev veth11 up
    3. [root@master1 ~]# ip netns exec docker2 ip addr add 172.30.0.200/24 dev veth21
    4. [root@master1 ~]# ip netns exec docker2 ip link set dev veth21 up

    7、为bridge1 分配ip并激活上线

    1. [root@master1 ~]# ip addr add 172.30.0.1/24 dev bridge1
    2. [root@master1 ~]# ip link set dev bridge1 up
    3. [root@master1 ~]# ip link set dev veth10 up
    4. [root@master1 ~]# ip link set dev veth20 up

    8、ping测试

    1. 1、tcpdump 抓取bridge1
    2. [root@master1 ~]# tcpdump -i bridge1 -n
    3. 2、在docker1 的ns ping docker2的ip
    4. [root@master1 ~]# ip netns exec docker1 ping -c 2 172.30.0.200
    5. PING 172.30.0.200 (172.30.0.200) 56(84) bytes of data.
    6. 64 bytes from 172.30.0.200: icmp_seq=1 ttl=64 time=0.285 ms
    7. 64 bytes from 172.30.0.200: icmp_seq=2 ttl=64 time=0.068 ms
    8. 3、抓包结果如下:
    9. #1、docker1 ---》docker2 的arp 请求
    10. 15:43:35.290903 ARP, Request who-has 172.30.0.200 tell 172.30.0.100, length 28
    11. 15:43:35.290946 ARP, Reply 172.30.0.200 is-at 56:96:e1:88:88:69, length 28
    12. #2、icmp发包
    13. 15:43:35.291013 IP 172.30.0.100 > 172.30.0.200: ICMP echo request, id 47979, seq 1, length 64
    14. 15:43:35.291068 IP 172.30.0.200 > 172.30.0.100: ICMP echo reply, id 47979, seq 1, length 64
    15. 15:43:36.343154 IP 172.30.0.100 > 172.30.0.200: ICMP echo request, id 47979, seq 2, length 64
    16. 15:43:36.343197 IP 172.30.0.200 > 172.30.0.100: ICMP echo reply, id 47979, seq 2, length 64
    17. # 3 docker2--》 docker1 的arp 请求
    18. 15:43:40.439240 ARP, Request who-has 172.30.0.100 tell 172.30.0.200, length 28
    19. 15:43:40.439276 ARP, Reply 172.30.0.100 is-at 92:4f:f5:bb:36:76, length 28

    实验二:宿主机访问容器网络

    1、在容器内启动服务,监听80端口

    [root@master1 ~]# ip netns exec docker1 nc -lp 80

    2、在宿主机上访问

    1. [root@master1 ~]# telnet 172.30.0.100 80
    2. Trying 172.30.0.100...
    3. Connected to 172.30.0.100.
    4. Escape character is '^]'.

    3、原理分析

    1. [root@master1 ~]# ip r |grep 172.30
    2. 172.30.0.0/24 dev bridge1 proto kernel scope link src 172.30.0.1
    3. [root@master1 ~]# ip netns exec docker1 ip r
    4. 172.30.0.0/24 dev veth11 proto kernel scope link src 172.30.0.100
    5. [root@master1 ~]# ip netns exec docker1 tcpdump -i veth11 -n
    6. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    7. listening on veth11, link-type EN10MB (Ethernet), capture size 262144 bytes
    8. 15:59:02.039154 IP6 fe80::74d1:d2ff:fe22:77 > ff02::2: ICMP6, router solicitation, length 16
    9. 15:59:10.176221 IP 172.30.0.1.55530 > 172.30.0.100.http: Flags [S], seq 674548384, win 64240, options [mss 1460,sackOK,TS val 2826573941 ecr 0,nop,wscale 7], length 0
    10. 15:59:10.176244 IP 172.30.0.100.http > 172.30.0.1.55530: Flags [S.], seq 773882025, ack 674548385, win 65160, options [mss 1460,sackOK,TS val 2679766513 ecr 2826573941,nop,wscale 7], length 0
    11. 15:59:10.176278 IP 172.30.0.1.55530 > 172.30.0.100.http: Flags [.], ack 1, win 502, options [nop,nop,TS val 2826573941 ecr 2679766513], length 0
    12. 15:59:15.351119 ARP, Request who-has 172.30.0.1 tell 172.30.0.100, length 28
    13. 15:59:15.351279 ARP, Request who-has 172.30.0.100 tell 172.30.0.1, length 28
    14. 15:59:15.351287 ARP, Reply 172.30.0.100 is-at 92:4f:f5:bb:36:76, length 28
    15. 15:59:15.351289 ARP, Reply 172.30.0.1 is-at 76:d1:d2:22:00:77, length 28
    16. 15:59:18.125125 IP 172.30.0.1.55530 > 172.30.0.100.http: Flags [P.], seq 1:6, ack 1, win 502, options [nop,nop,TS val 2826581890 ecr 2679766513], length 5: HTTP
    17. 15:59:18.125156 IP 172.30.0.100.http > 172.30.0.1.55530: Flags [.], ack 6, win 510, options [nop,nop,TS val 2679774462 ecr 2826581890], length 0
    18. !!source ip 是网桥bridge1的ip

    实验三:容器网络访问外部(SNAT)

    1、确认宿主机ip forwarding是否开启

    [root@master1 ~]# sysctl net.ipv4.conf.all.forwarding=1

    2、查看iptable 的forward连规则

    1. [root@master1 ~]# iptables -L |grep FORWARD
    2. Chain FORWARD (policy ACCEPT)

    3、修改容器的默认网关为bridge1的ip

    1. [root@master1 ~]# ip netns exec docker1 route add default
    2. gw 172.30.0.1 veth11
    3. [root@master1 ~]# ip netns exec docker2 route add default
    4. gw 172.30.0.1 veth21

    4、配置iptables 的snat规则

    器的IP地址外部并不认识,如果它要访问外网,需要在数据包离开前将源地址替换为宿主机的IP,这样外部主机才能用宿主机的IP作为目的地址发回响应。

    另外一个需要注意的问题,内核netfilter会追踪记录连接,我们在增加了SNAT规则时,系统会自动增加一个隐式的反向规则,这样返回的包会自动将宿主机的IP替换为容器IP。

    1. [root@master1 ~]# iptables -t nat -A POSTROUTING -s 172.30.0.0/24 ! -o bridge1 -j MASQUERADE
    2. [root@master1 ~]# iptables -t nat -L |grep 172.30
    3. MASQUERADE all -- 172.30.0.0/24 anywhere
    4. !上面的命令的含义是:在nat表的POSTROUTING链增加规则,当数据包的源地址为172.18.0.0/24网段,出口设备不是br0时,就执行MASQUERADE动作。

    5、从容器访问外部地址

    1. [root@master1 ~]# ip netns exec docker1 ping -c 2 114.114.114.114
    2. PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
    3. 64 bytes from 114.114.114.114: icmp_seq=1 ttl=83 time=33.5 ms
    4. 64 bytes from 114.114.114.114: icmp_seq=2 ttl=73 time=30.1 ms

    6、抓包分析

    1. # 在 veth11 上抓包
    2. [root@master1 ~]# ip netns exec docker1 tcpdump -i veth11 -p icmp -n
    3. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    4. listening on veth11, link-type EN10MB (Ethernet), capture size 262144 bytes
    5. 16:20:14.169626 IP 172.30.0.100 > 114.114.114.114: ICMP echo request, id 20867, seq 1, length 64
    6. 16:20:14.200613 IP 114.114.114.114 > 172.30.0.100: ICMP echo reply, id 20867, seq 1, length 64
    7. 16:20:15.170228 IP 172.30.0.100 > 114.114.114.114: ICMP echo request, id 20867, seq 2, length 64
    8. 16:20:15.201280 IP 114.114.114.114 > 172.30.0.100: ICMP echo reply, id 20867, seq 2, length 64
    9. # 在eth0上抓包
    10. [root@master1 ~]# tcpdump -i eth0 -p icmp -n
    11. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    12. listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    13. 16:20:14.169683 IP 10.51.104.5 > 114.114.114.114: ICMP echo request, id 20867, seq 1, length 64
    14. 16:20:14.200581 IP 114.114.114.114 > 10.51.104.5: ICMP echo reply, id 20867, seq 1, length 64
    15. 16:20:15.170309 IP 10.51.104.5 > 114.114.114.114: ICMP echo request, id 20867, seq 2, length 64
    16. 16:20:15.201238 IP 114.114.114.114 > 10.51.104.5: ICMP echo reply, id 20867, seq 2, length 64

    实验四:外部访问容器网络(DNAT)

    1、配置dnat规则

    1. [root@master1 ~]# iptables -t nat -A PREROUTING ! -i bridge1 -p tcp --dport 80 -j DNAT --to-destination 172.30.0.10
    2. 0:80
    3. 上面命令的含义是:在nat表的PREROUTING链增加规则,当输入设备不是bridge1 ,目的端口为80时,做目的地址转换,将宿主机IP替换为容器IP。

    2、容器启动服务监听80端口

    [root@master1 ~]# ip netns exec docker1 nc -lp 80

    3、使用其他主机来访问测试

    1. [root@node1 ~]# ip a |grep 51
    2. inet 10.51.104.8/24 brd 10.51.104.255 scope global dynamic eth0
    3. valid_lft 63517sec preferred_lft 63517sec
    4. [root@node1 ~]# telnet 10.51.104.5 80
    5. Trying 10.51.104.5...
    6. Connected to 10.51.104.5.
    7. Escape character is '^]'.

    4、抓包分析

    1. # eth0
    2. [root@master1 ~]# tcpdump -i eth0 tcp port 80 -n
    3. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    4. listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    5. 16:40:25.496628 IP 10.51.104.8.33818 > 10.51.104.5.http: Flags [S], seq 3457572293, win 64240, options [mss 1410,sackOK,TS val 317524495 ecr 0,nop,wscale 7], length 0
    6. 16:40:25.496772 IP 10.51.104.5.http > 10.51.104.8.33818: Flags [S.], seq 1034954862, ack 3457572294, win 65160, options [mss 1460,sackOK,TS val 106860443 ecr 317524495,nop,wscale 7], length 0
    7. 16:40:25.498787 IP 10.51.104.8.33818 > 10.51.104.5.http: Flags [.], ack 1, win 502, options [nop,nop,TS val 317524500 ecr 106860443], length 0
    8. 16:40:26.487974 IP 10.51.104.5.39144 > 31.13.81.4.http: Flags [S], seq 149664692, win 64240, options [mss 1460,sackOK,TS val 291139983 ecr 0,nop,wscale 7], length 0
  • 相关阅读:
    知识付费系统的技术架构和设计原则
    java StringReader类、StringWriter类
    vue-cli4升级到vue-cli5过程记录
    Shell(4)变量和赋值
    阿里云——OpenAPI使用——短信服务
    SNARK性能及安全——Verifier篇
    Hadoop伪分布模式安装
    tinymce 开启骨架屏(skeletonScreen) 优化加载体验
    为什么不建议使用Python自带的logging?
    关于离子色谱仪的结构和应用原理分析
  • 原文地址:https://blog.csdn.net/qq_27815483/article/details/139880972