目录
附件:kube-metric-server.yaml启动文件
前面在使用kubeadm工具部署K8S时,做过Metrics的部署,过程很简单。后来在生产上使用二进制方式部署K8S后,创建Metrics插件却屡屡遇坑,此处记录一下填坑过程。部署步骤请参考《【K8S 三】部署 metrics-server 插件》
为了更方便厘清问题,先上一张拓扑图(flanneld网络插件可以换成calico)

E0725 05:27:26.638019 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.11.191:10250/metrics/resource\": x509: cannot validate certificate for 192.168.11.191 because it doesn't contain any IP SANs" node="k8s-testing-02-191"
I0725 05:27:33.495998 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
解决:
- 添加参数
- - --kubelet-insecure-tls
- 或者
- - --tls-cert-file=/etc/ssl/pki/ca.pem
- - --tls-private-key-file=/etc/ssl/pki/ca-key.pem
scraper.go:140] "Failed to scrape node" err="Get \"https://linshi-k8s-54:10250/metrics/resource\": context deadline exceeded" node="linshi-k8s-54"
server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
解决:
保持--kubelet-preferred-address-types和apiserver一致
kubectl top node Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
问题定位:
#-- 查看metrics apiservice的event Message: failing or missing response from https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1: Get "https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.254.156.1:443: i/o timeout Reason: FailedDiscoveryCheck #-- 可以看到kubectl访问metrics的clusterIP超时了,配置apiserver配置--enable-aggregator-routing=true后,发现报错为 Message: failing or missing response from https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1: Get "https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1": dial tcp 172.254.247.87:4443: i/o timeout Reason: FailedDiscoveryCheck #-- kubectl直接访问endpoint也超时了 #-- 另:metrics service port只能监听在443上,手动配置成4443报错 Message: service/metrics-server in "kube-system" is not listening on port 443 Reason: ServicePortError 这是因为从该master到metrics server不通导致的;因为部署的master上没有kubelet和kube-proxy,如果apiserver上配置了--enable-aggregator-routing=true,则kubectl命令会直接访问metrics的endpoint,但是master无法访问node的pod网络(因为没有kubelet)。如果不配置--enable-aggregator-routing=true通过metrics service的clusterIP访问呢?因为没有kube-proxy代理导致对clusterIP也是不通(可以参看前面的拓扑图)。
解决:
- # 修改metrics server启动YAML文件:
- deployment.spec.template.spec.hostNetwork: true
- # 或者
- # 固定metrics service的地址,然后手动添加路由策略。
#--- metricsTLS server的启动参数可以通过下面命令自查询
docker run --rm 192.168.11.101/library/metrics-server:v0.6.1 --help--cert-dir=/tmp
#-- TLS证书存放目录,如果--tls-cert-file and --tls-private-key-file配置了,那么该参数被忽略
--secure-port=4443
#-- 提供带有身份验证和授权的HTTPS服务的端口。如果为0,则不提供HTTPS服务。443(默认)
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
#-- 用于kubelet连接的首选NodeAddressTypes的列表.这里要和kube-apiserver配置保持一致 (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
--kubelet-use-node-status-port
#-- 使用node状态中的port,优先级高于--kubelet-port
--metric-resolution=30s
#-- metrics-server到kubelet的采集周期,必须设置值至少10s。(默认1m0s)
--kubelet-insecure-tls
#-- 不要验证由Kubelets提供的CA或服务证书。仅供测试之用。如果不用该参数则需要将--tls-cert-file和--tls-private-key-file传入metrics server
--tls-cert-file
#-- 包含用于HTTPS的默认x509证书的文件。如果启用HTTPS服务,且不提供--tls-cert-file和--tls-private-key-file,则生成一个针对公共地址的自签名证书和密钥,并保存到--cert-dir指定的目录中。
--tls-private-key-file
#-- 包含默认的x509私钥匹配的文件--tls-cert-file。
--kubelet-port
#-- The port to use to connect to Kubelets. (default 10250)
- apiVersion: v1
- kind: ServiceAccount
- metadata:
- labels:
- k8s-app: metrics-server
- name: metrics-server
- namespace: kube-system
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- labels:
- k8s-app: metrics-server
- rbac.authorization.k8s.io/aggregate-to-admin: "true"
- rbac.authorization.k8s.io/aggregate-to-edit: "true"
- rbac.authorization.k8s.io/aggregate-to-view: "true"
- name: system:aggregated-metrics-reader
- rules:
- - apiGroups:
- - metrics.k8s.io
- resources:
- - pods
- - nodes
- verbs:
- - get
- - list
- - watch
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- labels:
- k8s-app: metrics-server
- name: system:metrics-server
- rules:
- - apiGroups:
- - ""
- resources:
- - nodes/metrics
- verbs:
- - get
- - apiGroups:
- - ""
- resources:
- - pods
- - nodes
- verbs:
- - get
- - list
- - watch
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: RoleBinding
- metadata:
- labels:
- k8s-app: metrics-server
- name: metrics-server-auth-reader
- namespace: kube-system
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: Role
- name: extension-apiserver-authentication-reader
- subjects:
- - kind: ServiceAccount
- name: metrics-server
- namespace: kube-system
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- labels:
- k8s-app: metrics-server
- name: metrics-server:system:auth-delegator
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: system:auth-delegator
- subjects:
- - kind: ServiceAccount
- name: metrics-server
- namespace: kube-system
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- labels:
- k8s-app: metrics-server
- name: system:metrics-server
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: system:metrics-server
- subjects:
- - kind: ServiceAccount
- name: metrics-server
- namespace: kube-system
- ---
- apiVersion: v1
- kind: Service
- metadata:
- labels:
- k8s-app: metrics-server
- name: metrics-server
- namespace: kube-system
- spec:
- ports:
- - name: https
- port: 443
- protocol: TCP
- targetPort: https
- selector:
- k8s-app: metrics-server
- ---
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- labels:
- k8s-app: metrics-server
- name: metrics-server
- namespace: kube-system
- spec:
- selector:
- matchLabels:
- k8s-app: metrics-server
- strategy:
- rollingUpdate:
- maxUnavailable: 0
- template:
- metadata:
- labels:
- k8s-app: metrics-server
- spec:
- containers:
- - args:
- - --cert-dir=/tmp
- - --secure-port=4443
- - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- - --kubelet-use-node-status-port
- - --metric-resolution=30s
- - --kubelet-insecure-tls
- # - --tls-cert-file=/etc/ssl/pki/ca.pem
- # - --tls-private-key-file=/etc/ssl/pki/ca-key.pem
- image: HARBOR_HOST_NAME/library/metrics-server:v0.6.1
- imagePullPolicy: IfNotPresent
- livenessProbe:
- failureThreshold: 3
- httpGet:
- path: /livez
- port: https
- scheme: HTTPS
- periodSeconds: 10
- name: metrics-server
- ports:
- - containerPort: 4443
- name: https
- protocol: TCP
- readinessProbe:
- failureThreshold: 3
- httpGet:
- path: /readyz
- port: https
- scheme: HTTPS
- initialDelaySeconds: 20
- periodSeconds: 10
- resources:
- requests:
- cpu: 100m
- memory: 200Mi
- securityContext:
- allowPrivilegeEscalation: false
- readOnlyRootFilesystem: true
- runAsNonRoot: true
- runAsUser: 1000
- volumeMounts:
- - mountPath: /tmp
- name: tmp-dir
- # - mountPath: /etc/ssl/pki
- # name: cert-dir
- nodeSelector:
- kubernetes.io/os: linux
- priorityClassName: system-cluster-critical
- serviceAccountName: metrics-server
- hostNetwork: true
- volumes:
- - emptyDir: {}
- name: tmp-dir
- # - name: cert-dir
- # hostPath:
- # path: /etc/ssl/certs/ca-certs/
- ---
- apiVersion: apiregistration.k8s.io/v1
- kind: APIService
- metadata:
- labels:
- k8s-app: metrics-server
- name: v1beta1.metrics.k8s.io
- spec:
- group: metrics.k8s.io
- groupPriorityMinimum: 100
- insecureSkipTLSVerify: true
- service:
- name: metrics-server
- namespace: kube-system
- version: v1beta1
- versionPriority: 100