• 【K8S 七】Metrics Server部署中的问题


    目录

     填坑过程

    问题一:启动metrics server报证书错误:x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs" node="k8s-testing-02-191"

    问题二:metrics server 一直未ready,查看日志报错:Failed to scrape node" err="Get \"https://x.x.x.x:10250/metrics/resource\": context deadline exceeded" 

    问题三:metrics server启动成功,但是执行kubectl top node报错:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

    metrics server启动参数

     附件:kube-metric-server.yaml启动文件


    前面在使用kubeadm工具部署K8S时,做过Metrics的部署,过程很简单。后来在生产上使用二进制方式部署K8S后,创建Metrics插件却屡屡遇坑,此处记录一下填坑过程。部署步骤请参考《【K8S 三】部署 metrics-server 插件

    为了更方便厘清问题,先上一张拓扑图(flanneld网络插件可以换成calico)

     填坑过程

    问题一:启动metrics server报证书错误:x509: cannot validate certificate for x.x.x.x because it doesn't contain any IP SANs" node="k8s-testing-02-191"

     E0725 05:27:26.638019       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.11.191:10250/metrics/resource\": x509: cannot validate certificate for 192.168.11.191 because it doesn't contain any IP SANs" node="k8s-testing-02-191"
     I0725 05:27:33.495998       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

    解决:

    1.  添加参数
    2.         - --kubelet-insecure-tls
    3.  或者
    4.         - --tls-cert-file=/etc/ssl/pki/ca.pem
    5.         - --tls-private-key-file=/etc/ssl/pki/ca-key.pem

    问题二:metrics server 一直未ready,查看日志报错:Failed to scrape node" err="Get \"https://x.x.x.x:10250/metrics/resource\": context deadline exceeded" 

     scraper.go:140] "Failed to scrape node" err="Get \"https://linshi-k8s-54:10250/metrics/resource\": context deadline exceeded" node="linshi-k8s-54"
     server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
    
    

    解决:

    保持--kubelet-preferred-address-types和apiserver一致

    问题三:metrics server启动成功,但是执行kubectl top node报错:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

     kubectl top node
     Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

    问题定位:

    #-- 查看metrics apiservice的event
    Message:               failing or missing response from 
    https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1: Get 
    "https://10.254.156.1:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.254.156.1:443: i/o timeout
    Reason:                FailedDiscoveryCheck
    #-- 可以看到kubectl访问metrics的clusterIP超时了,配置apiserver配置--enable-aggregator-routing=true后,发现报错为
    Message:               failing or missing response from 
    https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1: Get 
    "https://172.254.247.87:4443/apis/metrics.k8s.io/v1beta1": dial tcp 172.254.247.87:4443: i/o timeout
    Reason:                FailedDiscoveryCheck
    #-- kubectl直接访问endpoint也超时了
    #-- 另:metrics service port只能监听在443上,手动配置成4443报错
    Message:               service/metrics-server in "kube-system" is not listening on port 443
    Reason:                ServicePortError
    这是因为从该master到metrics server不通导致的;因为部署的master上没有kubelet和kube-proxy,如果apiserver上配置了--enable-aggregator-routing=true,则kubectl命令会直接访问metrics的endpoint,但是master无法访问node的pod网络(因为没有kubelet)。如果不配置--enable-aggregator-routing=true通过metrics service的clusterIP访问呢?因为没有kube-proxy代理导致对clusterIP也是不通(可以参看前面的拓扑图)。
    
    

    解决:

    1. # 修改metrics server启动YAML文件:
    2.  deployment.spec.template.spec.hostNetwork: true
    3. # 或者
    4. # 固定metrics service的地址,然后手动添加路由策略。

    metrics server启动参数

    #--- metricsTLS  server的启动参数可以通过下面命令自查询
    docker run --rm 192.168.11.101/library/metrics-server:v0.6.1 --help

    --cert-dir=/tmp
    #-- TLS证书存放目录,如果--tls-cert-file and --tls-private-key-file配置了,那么该参数被忽略
    --secure-port=4443
    #-- 提供带有身份验证和授权的HTTPS服务的端口。如果为0,则不提供HTTPS服务。443(默认)
    --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    #-- 用于kubelet连接的首选NodeAddressTypes的列表.这里要和kube-apiserver配置保持一致  (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
    --kubelet-use-node-status-port
    #-- 使用node状态中的port,优先级高于--kubelet-port
    --metric-resolution=30s
    #-- metrics-server到kubelet的采集周期,必须设置值至少10s。(默认1m0s)
    --kubelet-insecure-tls
    #-- 不要验证由Kubelets提供的CA或服务证书。仅供测试之用。如果不用该参数则需要将--tls-cert-file和--tls-private-key-file传入metrics server
    --tls-cert-file
    #-- 包含用于HTTPS的默认x509证书的文件。如果启用HTTPS服务,且不提供--tls-cert-file和--tls-private-key-file,则生成一个针对公共地址的自签名证书和密钥,并保存到--cert-dir指定的目录中。
    --tls-private-key-file
    #-- 包含默认的x509私钥匹配的文件--tls-cert-file。
    --kubelet-port
    #-- The port to use to connect to Kubelets. (default 10250)

     附件:kube-metric-server.yaml启动文件

    1. apiVersion: v1
    2. kind: ServiceAccount
    3. metadata:
    4. labels:
    5. k8s-app: metrics-server
    6. name: metrics-server
    7. namespace: kube-system
    8. ---
    9. apiVersion: rbac.authorization.k8s.io/v1
    10. kind: ClusterRole
    11. metadata:
    12. labels:
    13. k8s-app: metrics-server
    14. rbac.authorization.k8s.io/aggregate-to-admin: "true"
    15. rbac.authorization.k8s.io/aggregate-to-edit: "true"
    16. rbac.authorization.k8s.io/aggregate-to-view: "true"
    17. name: system:aggregated-metrics-reader
    18. rules:
    19. - apiGroups:
    20. - metrics.k8s.io
    21. resources:
    22. - pods
    23. - nodes
    24. verbs:
    25. - get
    26. - list
    27. - watch
    28. ---
    29. apiVersion: rbac.authorization.k8s.io/v1
    30. kind: ClusterRole
    31. metadata:
    32. labels:
    33. k8s-app: metrics-server
    34. name: system:metrics-server
    35. rules:
    36. - apiGroups:
    37. - ""
    38. resources:
    39. - nodes/metrics
    40. verbs:
    41. - get
    42. - apiGroups:
    43. - ""
    44. resources:
    45. - pods
    46. - nodes
    47. verbs:
    48. - get
    49. - list
    50. - watch
    51. ---
    52. apiVersion: rbac.authorization.k8s.io/v1
    53. kind: RoleBinding
    54. metadata:
    55. labels:
    56. k8s-app: metrics-server
    57. name: metrics-server-auth-reader
    58. namespace: kube-system
    59. roleRef:
    60. apiGroup: rbac.authorization.k8s.io
    61. kind: Role
    62. name: extension-apiserver-authentication-reader
    63. subjects:
    64. - kind: ServiceAccount
    65. name: metrics-server
    66. namespace: kube-system
    67. ---
    68. apiVersion: rbac.authorization.k8s.io/v1
    69. kind: ClusterRoleBinding
    70. metadata:
    71. labels:
    72. k8s-app: metrics-server
    73. name: metrics-server:system:auth-delegator
    74. roleRef:
    75. apiGroup: rbac.authorization.k8s.io
    76. kind: ClusterRole
    77. name: system:auth-delegator
    78. subjects:
    79. - kind: ServiceAccount
    80. name: metrics-server
    81. namespace: kube-system
    82. ---
    83. apiVersion: rbac.authorization.k8s.io/v1
    84. kind: ClusterRoleBinding
    85. metadata:
    86. labels:
    87. k8s-app: metrics-server
    88. name: system:metrics-server
    89. roleRef:
    90. apiGroup: rbac.authorization.k8s.io
    91. kind: ClusterRole
    92. name: system:metrics-server
    93. subjects:
    94. - kind: ServiceAccount
    95. name: metrics-server
    96. namespace: kube-system
    97. ---
    98. apiVersion: v1
    99. kind: Service
    100. metadata:
    101. labels:
    102. k8s-app: metrics-server
    103. name: metrics-server
    104. namespace: kube-system
    105. spec:
    106. ports:
    107. - name: https
    108. port: 443
    109. protocol: TCP
    110. targetPort: https
    111. selector:
    112. k8s-app: metrics-server
    113. ---
    114. apiVersion: apps/v1
    115. kind: Deployment
    116. metadata:
    117. labels:
    118. k8s-app: metrics-server
    119. name: metrics-server
    120. namespace: kube-system
    121. spec:
    122. selector:
    123. matchLabels:
    124. k8s-app: metrics-server
    125. strategy:
    126. rollingUpdate:
    127. maxUnavailable: 0
    128. template:
    129. metadata:
    130. labels:
    131. k8s-app: metrics-server
    132. spec:
    133. containers:
    134. - args:
    135. - --cert-dir=/tmp
    136. - --secure-port=4443
    137. - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    138. - --kubelet-use-node-status-port
    139. - --metric-resolution=30s
    140. - --kubelet-insecure-tls
    141. # - --tls-cert-file=/etc/ssl/pki/ca.pem
    142. # - --tls-private-key-file=/etc/ssl/pki/ca-key.pem
    143. image: HARBOR_HOST_NAME/library/metrics-server:v0.6.1
    144. imagePullPolicy: IfNotPresent
    145. livenessProbe:
    146. failureThreshold: 3
    147. httpGet:
    148. path: /livez
    149. port: https
    150. scheme: HTTPS
    151. periodSeconds: 10
    152. name: metrics-server
    153. ports:
    154. - containerPort: 4443
    155. name: https
    156. protocol: TCP
    157. readinessProbe:
    158. failureThreshold: 3
    159. httpGet:
    160. path: /readyz
    161. port: https
    162. scheme: HTTPS
    163. initialDelaySeconds: 20
    164. periodSeconds: 10
    165. resources:
    166. requests:
    167. cpu: 100m
    168. memory: 200Mi
    169. securityContext:
    170. allowPrivilegeEscalation: false
    171. readOnlyRootFilesystem: true
    172. runAsNonRoot: true
    173. runAsUser: 1000
    174. volumeMounts:
    175. - mountPath: /tmp
    176. name: tmp-dir
    177. # - mountPath: /etc/ssl/pki
    178. # name: cert-dir
    179. nodeSelector:
    180. kubernetes.io/os: linux
    181. priorityClassName: system-cluster-critical
    182. serviceAccountName: metrics-server
    183. hostNetwork: true
    184. volumes:
    185. - emptyDir: {}
    186. name: tmp-dir
    187. # - name: cert-dir
    188. # hostPath:
    189. # path: /etc/ssl/certs/ca-certs/
    190. ---
    191. apiVersion: apiregistration.k8s.io/v1
    192. kind: APIService
    193. metadata:
    194. labels:
    195. k8s-app: metrics-server
    196. name: v1beta1.metrics.k8s.io
    197. spec:
    198. group: metrics.k8s.io
    199. groupPriorityMinimum: 100
    200. insecureSkipTLSVerify: true
    201. service:
    202. name: metrics-server
    203. namespace: kube-system
    204. version: v1beta1
    205. versionPriority: 100

  • 相关阅读:
    计算机网络(第7版)第二章(应用层)知识点整理
    LeetCode--180 连续出现的数字
    C语言学习——数组初学
    Dubbo安装部署
    HJ58 输入n个整数,输出其中最小的k个
    LeetCode 热题 HOT 100 第七十六天 394. 字符串解码 中等题 用python3求解
    alginate-Ferrocene|海藻酸钠-二茂铁|二茂铁修饰改性海藻酸钠|海藻酸钠-peg-二茂铁
    从 ECMAScript 6 角度谈谈执行上下文
    近期AI编程助手工具(一),别再错过啦!
    springboot 集成JWT实现token验证
  • 原文地址:https://blog.csdn.net/avatar_2009/article/details/126016679