Flannel 是 LCK 默认采用的网络插件方案,默认条件下 LCK 使用的是 vxlan
的模式,私有化场景下,如果确定客户的主机都在一个子网内,可以使用 host-gw
模式提高网络性能
Flannel 的安装逻辑如下,通过安装的 yaml 文件里有两个 initContainer,专门就是用来做 CNI 和 Flannel 配置的安装,所以命名也是叫 install-cni-plugin 以及 install-cni
那么这两个容器主要是怎么安装的呢,其实很简单,可以看看 args
字段,实际上就是把 flannel 的二进制,以及 cni-conf.json 和 10-flannel.conflist 通过 cp
复制到指定的目录
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply)
image: rancher/mirrored-flannelcni-flannel:v0.18.1
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
-
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
这些配置文件又是从哪里来的呢,实际上是来自于 configMap
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
这些配置文件不会像 initContainer 那样把文件落到宿主机的,而是通过 volumeMount 的方式提供给运行 Flannel 二进制的容器,所以这些文件在宿主机上的 /etc/kube-flannel/
目录是找不到的,进入到 Flannel 的容器才能看到
# kiexec
Namespace: kube-system | Pod: ✔ kube-flannel-ds-82mww
/ # ls /etc/kube-flannel/
cni-conf.json net-conf.json
vxlan
是 Flannel 默认采用的模式,此模式下的节点路由如下:
# ip r
default via 172.22.0.1 dev eth0
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
169.254.0.0/16 dev eth0 scope link metric 1002
通过修改配置,也可以让 Flannel 切换到 host-gw
上,此模式下的节点路由变成:
# ip r
default via 172.22.0.1 dev eth0
10.4.0.0/24 dev nerdctl0 proto kernel scope link src 10.4.0.1
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 172.22.1.176 dev eth0
10.244.2.0/24 via 172.22.0.117 dev eth0
10.244.3.0/24 via 172.22.0.76 dev eth0
10.244.4.0/24 via 172.22.0.212 dev eth0
10.244.5.0/24 via 172.22.0.64 dev eth0
169.254.0.0/16 dev eth0 scope link metric 1002
172.22.0.0/20 dev eth0 proto kernel scope link src 172.22.0.239
切换后,Flannel 的日志如下:
I0826 03:22:37.551391 1 main.go:463] Found network config - Backend type: host-gw
I0826 03:22:37.551432 1 match.go:195] Determining IP address of default interface
I0826 03:22:37.551838 1 match.go:248] Using interface with name eth0 and address 172.22.1.176
I0826 03:22:37.551860 1 match.go:270] Defaulting external address to interface address (172.22.1.176)
I0826 03:22:37.569614 1 kube.go:351] Setting NodeNetworkUnavailable
I0826 03:22:37.579433 1 main.go:341] Setting up masking rules
I0826 03:22:37.758215 1 main.go:362] Changing default FORWARD chain policy to ACCEPT
I0826 03:22:37.758315 1 main.go:375] Wrote subnet file to /run/flannel/subnet.env
I0826 03:22:37.758326 1 main.go:379] Running backend.
I0826 03:22:37.758343 1 main.go:400] Waiting for all goroutines to exit
I0826 03:22:37.761081 1 route_network.go:55] Watching for new subnet leases
I0826 03:22:37.761153 1 route_network.go:92] Subnet added: 10.244.0.0/24 via 172.22.0.239
W0826 03:22:37.761524 1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.0.0/24 Src: Gw: 10.244.0.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.0.0/24 Src: Gw: 172.22.0.239 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.848961 1 route_network.go:92] Subnet added: 10.244.2.0/24 via 172.22.0.117
W0826 03:22:37.849059 1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.2.0/24 Src: Gw: 10.244.2.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.2.0/24 Src: Gw: 172.22.0.117 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.849360 1 route_network.go:92] Subnet added: 10.244.3.0/24 via 172.22.0.76
W0826 03:22:37.849454 1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.3.0/24 Src: Gw: 10.244.3.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.3.0/24 Src: Gw: 172.22.0.76 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.850273 1 route_network.go:92] Subnet added: 10.244.4.0/24 via 172.22.0.212
W0826 03:22:37.850377 1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.4.0/24 Src: Gw: 10.244.4.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.4.0/24 Src: Gw: 172.22.0.212 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.850675 1 route_network.go:92] Subnet added: 10.244.5.0/24 via 172.22.0.64
W0826 03:22:37.850758 1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.5.0/24 Src: Gw: 10.244.5.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.5.0/24 Src: Gw: 172.22.0.64 Flags: [] Table: 0 Realm: 0}
其中 Subnet added: 10.244.0.0/24 via 172.22.0.239
的日志已经说的非常明白了,这里调整的路由是将某个节点的 ip 作为某个子网的网关,因此数据包不需要封包,就可以直接路由到这个节点上,另外就是由于 host-gw
不需要封包解包,所以 MTU
的值会被 Flannel 自动改为1500
# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true
关于修改配置后,其他容器需要重启吗?正常情况是不需要的,因为容器的网络栈只会让容器的数据包发到 cni0 这个设备上,至于后面是走 vxlan
还是 host-gw
完全取决于路由的配置,但是不排除某些组件会对路由、网络方案的改变敏感,是否进行变更,请仔细测试再实行,另外 host-gw
虽然性能上更好,但是使用上是需要满足一定的条件的,最基本的是 worker 节点需要在同一个子网下,也就是二层可以通信的
benchmark 工具使用的是 k8s-bench-suite, 具体命令是 knb --verbose --client-node node2 --server-node node3
, 在同样的机器上进行测试,实测结果vxlan
模式对比 host-gw
模式,大概会有10%左右的额外消耗(数据取决于硬件和网络质量)
vxlan
模式
=========================================================
Benchmark Results
=========================================================
Name : knb-12885
Date : 2022-08-26 07:11:41 UTC
Generator : knb
Version : 1.5.0
Server : node2
Client : node3
UDP Socket size : auto
=========================================================
Discovered CPU : Intel Xeon Processor (Skylake, IBRS)
Discovered Kernel : 5.4.127-1.el7.elrepo.x86_64
Discovered k8s version : v1.21.7
Discovered MTU : 1450
Idle :
bandwidth = 0 Mbit/s
client cpu = total 6.97% (user 2.53%, nice 0.05%, system 4.21%, iowait 0.03%, steal 0.15%)
server cpu = total 8.09% (user 2.73%, nice 0.05%, system 5.18%, iowait 0.00%, steal 0.13%)
client ram = 1233 MB
server ram = 1198 MB
Pod to pod :
TCP :
bandwidth = 845 Mbit/s
client cpu = total 5.06% (user 1.35%, nice 0.05%, system 3.49%, iowait 0.07%, steal 0.10%)
server cpu = total 10.78% (user 1.76%, nice 0.02%, system 8.98%, iowait 0.02%, steal 0.00%)
client ram = 1235 MB
server ram = 1197 MB
UDP :
bandwidth = 877 Mbit/s
client cpu = total 26.54% (user 2.83%, nice 0.05%, system 23.57%, iowait 0.07%, steal 0.02%)
server cpu = total 13.43% (user 3.74%, nice 0.03%, system 9.56%, iowait 0.00%, steal 0.10%)
client ram = 1234 MB
server ram = 1198 MB
Pod to Service :
TCP :
bandwidth = 856 Mbit/s
client cpu = total 5.25% (user 1.40%, nice 0.05%, system 3.68%, iowait 0.05%, steal 0.07%)
server cpu = total 10.31% (user 1.92%, nice 0.02%, system 8.37%, iowait 0.00%, steal 0.00%)
client ram = 1233 MB
server ram = 1199 MB
UDP :
bandwidth = 835 Mbit/s
client cpu = total 27.90% (user 2.94%, nice 0.02%, system 24.82%, iowait 0.07%, steal 0.05%)
server cpu = total 13.29% (user 3.74%, nice 0.03%, system 9.49%, iowait 0.00%, steal 0.03%)
client ram = 1236 MB
server ram = 1203 MB
=========================================================
host-gw
模式
=========================================================
Benchmark Results
=========================================================
Name : knb-8657
Date : 2022-08-26 07:08:07 UTC
Generator : knb
Version : 1.5.0
Server : node2
Client : node3
UDP Socket size : auto
=========================================================
Discovered CPU : Intel Xeon Processor (Skylake, IBRS)
Discovered Kernel : 5.4.127-1.el7.elrepo.x86_64
Discovered k8s version : v1.21.7
Discovered MTU : 1500
Idle :
bandwidth = 0 Mbit/s
client cpu = total 3.35% (user 1.56%, nice 0.02%, system 1.70%, iowait 0.07%, steal 0.00%)
server cpu = total 2.45% (user 1.14%, nice 0.09%, system 1.22%, iowait 0.00%, steal 0.00%)
client ram = 1258 MB
server ram = 1194 MB
Pod to pod :
TCP :
bandwidth = 875 Mbit/s
client cpu = total 4.53% (user 1.37%, nice 0.00%, system 3.00%, iowait 0.09%, steal 0.07%)
server cpu = total 7.61% (user 1.49%, nice 0.07%, system 5.98%, iowait 0.02%, steal 0.05%)
client ram = 1250 MB
server ram = 1197 MB
UDP :
bandwidth = 944 Mbit/s
client cpu = total 34.08% (user 4.70%, nice 0.03%, system 28.94%, iowait 0.03%, steal 0.38%)
server cpu = total 18.45% (user 4.81%, nice 0.02%, system 13.11%, iowait 0.02%, steal 0.49%)
client ram = 1245 MB
server ram = 1197 MB
Pod to Service :
TCP :
bandwidth = 931 Mbit/s
client cpu = total 4.01% (user 1.25%, nice 0.05%, system 2.62%, iowait 0.09%, steal 0.00%)
server cpu = total 8.14% (user 1.59%, nice 0.02%, system 6.48%, iowait 0.00%, steal 0.05%)
client ram = 1242 MB
server ram = 1197 MB
UDP :
bandwidth = 896 Mbit/s
client cpu = total 26.61% (user 2.79%, nice 0.02%, system 23.73%, iowait 0.07%, steal 0.00%)
server cpu = total 11.16% (user 3.18%, nice 0.03%, system 7.89%, iowait 0.00%, steal 0.06%)
client ram = 1236 MB
server ram = 1197 MB
=========================================================