• prometheus/grafana监控数据收集与展示——k8s从入门到高并发系列教程(九)


            我们用自动化流程把我们提交的代码打包成镜像部署到k8s集群中后,经过jmeter压测发现其实很不理想,在接口返回的正确性和响应时间上都有很大的问题。这并不是我们代码本身写错了什么,因为同样的代码有一半概率是成功执行的。代码的尽头是神学?错,代码的尽头是运维!要想找出原因,我们先建立我们的检测系统。从docker容器监控、phpfpm监控、nginx监控图标中找出问题的所在

     

     

    Prometheus安装

    prometheus内置一个时序数据库,用于对系统运行数据的收集与展示

    prometheus amd版本的docker镜像为 prom/prometheus,而arm64处理器的docker镜像为prom/prometheus-linux-arm64,数据存储目录为 /prometheus,需要暴露端口号 9090 供外部访问,配置文件为 /etc/prometheus/prometheus.yml

    先创建一个存储卷prometheus-data

    1. apiVersion: v1
    2. kind: PersistentVolumeClaim
    3. metadata:
    4. name: promethues-data
    5. namespace: promethues
    6. spec:
    7. accessModes:
    8. - ReadWriteOnce
    9. resources:
    10. requests:
    11. storage: 250Mi
    12. storageClassName: local-path
    13. volumeMode: Filesystem

    创建一个初始化的prometheus配置文件

    1. apiVersion: v1
    2. data:
    3. prometheus.yml: |-
    4. global:
    5. scrape_interval: 2s
    6. evaluation_interval: 2s
    7. scrape_configs:
    8. kind: ConfigMap
    9. metadata:
    10. name: prometheus-config
    11. namespace: promethues

    因为监控系统的特殊权限要求,需要先设置一个prometheus的账户

    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRole
    3. metadata:
    4. name: promethues
    5. rules:
    6. - apiGroups: [""]
    7. resources:
    8. - nodes
    9. - nodes/proxy
    10. - services
    11. - endpoints
    12. - pods
    13. verbs: ["get", "list", "watch"]
    14. - apiGroups:
    15. - extensions
    16. resources:
    17. - ingresses
    18. verbs: ["get", "list", "watch"]
    19. - nonResourceURLs: ["/metrics"]
    20. verbs: ["get"]
    21. ---
    22. apiVersion: v1
    23. kind: ServiceAccount
    24. metadata:
    25. name: promethues
    26. namespace: promethues
    27. ---
    28. apiVersion: rbac.authorization.k8s.io/v1
    29. kind: ClusterRoleBinding
    30. metadata:
    31. name: promethues
    32. roleRef:
    33. apiGroup: rbac.authorization.k8s.io
    34. kind: ClusterRole
    35. name: promethues
    36. subjects:
    37. - kind: ServiceAccount
    38. name: promethues
    39. namespace: promethues

    创建prometheus的deployment,这个deployment是使用上面创建的service account,并挂载凭证到容器中,容器中的路径为 /var/run/secrets/kubernetes.io/serviceaccount/

    1. apiVersion: apps/v1
    2. kind: Deployment
    3. metadata:
    4. labels:
    5. k8s.kuboard.cn/layer: monitor
    6. k8s.kuboard.cn/name: promethues-k8s
    7. name: promethues-k8s
    8. namespace: promethues
    9. spec:
    10. selector:
    11. matchLabels:
    12. k8s.kuboard.cn/layer: monitor
    13. k8s.kuboard.cn/name: promethues-k8s
    14. template:
    15. metadata:
    16. labels:
    17. k8s.kuboard.cn/layer: monitor
    18. k8s.kuboard.cn/name: promethues-k8s
    19. spec:
    20. automountServiceAccountToken: true
    21. containers:
    22. - image: prom/prometheus-linux-arm64
    23. name: promethues
    24. ports:
    25. - containerPort: 9090
    26. name: api
    27. protocol: TCP
    28. volumeMounts:
    29. - mountPath: /etc/prometheus
    30. name: volume-jpcw8
    31. serviceAccount: promethues
    32. serviceAccountName: promethues
    33. volumes:
    34. - configMap:
    35. defaultMode: 420
    36. name: prometheus-config
    37. name: volume-jpcw8

    开放9090端口外网访问,外部端口30044

    1. apiVersion: v1
    2. kind: Service
    3. metadata:
    4. labels:
    5. k8s.kuboard.cn/layer: monitor
    6. k8s.kuboard.cn/name: promethues-k8s
    7. name: promethues-k8s
    8. namespace: promethues
    9. spec:
    10. ports:
    11. - name: 8jmgrm
    12. nodePort: 30044
    13. port: 9090
    14. protocol: TCP
    15. targetPort: 9090
    16. selector:
    17. k8s.kuboard.cn/layer: monitor
    18. k8s.kuboard.cn/name: promethues-k8s
    19. type: NodePort

    这样,可以访问 http://127.0.0.1:30044/ 查看prometheus界面了

    cadvisor抓取节点容器中的cpu内存信息

    prometheus.yml文件中增加以下job,通过cadvisor抓取节点容器中的cpu内存信息

    1. - job_name: 'kubernetes-pods'
    2. scheme: https
    3. tls_config:
    4. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    5. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    6. kubernetes_sd_configs:
    7. - role: node
    8. relabel_configs:
    9. - target_label: __address__
    10. replacement: kubernetes.default.svc:443
    11. - source_labels: [__meta_kubernetes_node_name]
    12. regex: (.+)
    13. target_label: __metrics_path__
    14. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    • 使用promethues这个service account访问抓取节点
    • 通过k8s的/proxy/metrics/cadvisor这个api抓取容器cpu内存信息,使用https协议抓取
    • __address__ 当前Target实例的访问地址:
    • __metrics_path__:采集目标服务访问地址的访问路径,从 __meta_kubernetes_node_name 中提取数值填充节点名称
    • 把__meta_kubernetes_node_label_ 开头的标签全部入库

    重启deployment,在promethues的 status->targes可以看到如下内容

     

    可以看到kubernetes这个job的数据抓取地址为 https://kubernetes.default.svc/api/v1/nodes/primary/proxy/metrics/cadvisor 

    打开kubectl proxy,看到输出 

    Starting to serve on 127.0.0.1:8001

    新开一个窗口,把 https://kubernetes.default.svc 替换成 http://127.0.0.1:8001, curl访问试试

    curl http://127.0.0.1:8001/api/v1/nodes/primary/proxy/metrics/cadvisor | grep HELP | grep cpu

    最后找到cpu占用比率的百分比查询字段为

    container_cpu_load_average_10s{namespace="test-project1",image=~".*mustafa_project.*"}

     从图像上来看,即使请求失败,cpu并没有产生什么变化

    内存占用比率的百分比查询字段为

    container_memory_usage_bytes{namespace="test-project1",image=~".*mustafa_project.*"}/container_spec_memory_limit_bytes{namespace="test-project1",image=~".*mustafa_project.*"}

     从这张图可以看出,内存占比不到10%,但接口请求已经出现失败的情况了,所以失败的原因目前并不在cpu和内存。

    下面我们开始监控php-fpm

    安装php-fpm-exporter

    phpfpm-exporter 的github地址为 https://github.com/bakins/php-fpm-exporter.git ,我们使用镜像多阶段构建,启动一个go语言的容器,进行编译,把编译好的可执行文件放到自己phpfpm镜像的/usr/local/bin目录下

    phpfpm项目的dockerfile文件头部增加以下内容

    1. FROM golang:buster as builder-golang
    2. RUN git clone https://ghproxy.com/https://github.com/bakins/php-fpm-exporter.git /tmp/php-fpm-exporter \
    3. && cd /tmp/php-fpm-exporter && sed -i 's/amd64/arm64/g' script/build \
    4. && ./script/build && chmod +x php-fpm-exporter.linux.arm64
    5. FROM php:7.2-fpm as final
    6. COPY --from=builder-golang /tmp/php-fpm-exporter/php-fpm-exporter.linux.arm64 /usr/local/bin/php-fpm-exporter

    就是修改那个git项目的script/build文件,啊amd64换成arm64进行编译,最后把编译好的可执行文件拷贝的自己的镜像中

    phpfpm开启监控

    修改phpfpm的www.conf文件,修改以下内容

    1. pm.status_path = /php_status
    2. ping.path = /ping

    这样访问php_status可以抓取到php的状态信息

    启动phpfom-export向外部发送php_status信息,修改entry.sh文件

    1. #!/bin/sh
    2. php-fpm -D
    3. nginx
    4. php-fpm-exporter --addr="0.0.0.0:9190" --fastcgi="tcp://127.0.0.1:9000/php_status"

    9190端口抓取php_status信息

    需要一个service公开9190端口给promethus查询

    1. apiVersion: v1
    2. kind: Service
    3. metadata:
    4. name: test-client1
    5. spec:
    6. ports:
    7. - name: http-api
    8. protocol: TCP
    9. port: 80
    10. targetPort: 80
    11. - name: http-php-fpm
    12. protocol: TCP
    13. port: 9190
    14. targetPort: 9190
    15. selector:
    16. app: test-client1

    配置prometheus抓取php-fpm信息

    上面prometheus是自动发现nodes,然后接口拿nodes上的cadvisor接口内容获取容器的cpu、内存信息,这次是prometheus自动发现pods,获取pod的9190端口内容来抓取php-fpm,并且只抓取project1的命令空间

    1. - job_name: 'php-fpm'
    2. scheme: http
    3. tls_config:
    4. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    5. insecure_skip_verify: true
    6. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    7. kubernetes_sd_configs:
    8. - role: pod
    9. relabel_configs:
    10. - action: labelmap
    11. regex: __meta_kubernetes_pod_label_(.+)
    12. - source_labels: [__meta_kubernetes_namespace]
    13. action: keep
    14. regex: .*project1.*
    15. - source_labels: [__meta_kubernetes_namespace]
    16. action: replace
    17. target_label: kubernetes_namespace
    18. - source_labels: [__meta_kubernetes_pod_ip]
    19. action: replace
    20. regex: (.+)
    21. target_label: __address__
    22. replacement: ${1}:9190

    实际上抓了两个pod的phpfpm信息

     访问了一下这个endporints,看看返回

    1. ➜ ~ curl http://10.42.0.20:9190/metrics
    2. # HELP phpfpm_accepted_connections_total Total number of accepted connections
    3. # TYPE phpfpm_accepted_connections_total counter
    4. phpfpm_accepted_connections_total 145
    5. # HELP phpfpm_active_max_processes Maximum active process count
    6. # TYPE phpfpm_active_max_processes counter
    7. phpfpm_active_max_processes 1
    8. # HELP phpfpm_listen_queue_connections Number of connections that have been initiated but not yet accepted
    9. # TYPE phpfpm_listen_queue_connections gauge
    10. phpfpm_listen_queue_connections 0
    11. # HELP phpfpm_listen_queue_length_connections The length of the socket queue, dictating maximum number of pending connections
    12. # TYPE phpfpm_listen_queue_length_connections gauge
    13. phpfpm_listen_queue_length_connections 511
    14. # HELP phpfpm_listen_queue_max_connections Max number of connections the listen queue has reached since FPM start
    15. # TYPE phpfpm_listen_queue_max_connections counter
    16. phpfpm_listen_queue_max_connections 0
    17. # HELP phpfpm_max_children_reached_total Number of times the process limit has been reached
    18. # TYPE phpfpm_max_children_reached_total counter
    19. phpfpm_max_children_reached_total 0
    20. # HELP phpfpm_processes_total process count
    21. # TYPE phpfpm_processes_total gauge
    22. phpfpm_processes_total{state="active"} 1
    23. phpfpm_processes_total{state="idle"} 1
    24. # HELP phpfpm_scrape_failures_total Number of errors while scraping php_fpm
    25. # TYPE phpfpm_scrape_failures_total counter
    26. phpfpm_scrape_failures_total 0
    27. # HELP phpfpm_slow_requests_total Number of requests that exceed request_slowlog_timeout
    28. # TYPE phpfpm_slow_requests_total counter
    29. phpfpm_slow_requests_total 0
    30. # HELP phpfpm_up able to contact php-fpm
    31. # TYPE phpfpm_up gauge
    32. phpfpm_up 1

    查看请求量变化 

    irate(phpfpm_accepted_connections_total{app="test-client1"}[1m])

    查看phpfpm等待队列的长度

    phpfpm_listen_queue_connections

    查看活跃的php-fpm进程数

    phpfpm_processes_total{state="active"}

    api项目偶尔有5个php-fpm进程在运行,而client1项目则始终只有一个php-fpm进程运行,这造成了php-fpm有时候来不及处理接口调用,导致微服务超时

    phpfpm进程数优化 

            单个docker容器中phpfpm进程数设为固定数量,当请求量增加时可以使用k8s的自动扩容提升并发处理的能力,phpfpm进程数的设置数量为 内存容量/30M 大约4个到5个

    1. pm = static
    2. pm.max_children = 5
    sum(phpfpm_processes_total{app="test-client1"})

    这样查询phpfpm进程数始终是5个了

    安装nginx-exporter

    监控nginx需要安装nginx-exporter

    1. # 安装nginx-exporter
    2. RUN curl https://ghproxy.com/https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_arm64.tar.gz -o /tmp/nginx-prometheus-exporter.tar.gz \
    3. && cd /tmp && tar zxvf nginx-prometheus-exporter.tar.gz \
    4. && mv nginx-prometheus-exporter /usr/local/bin/nginx-prometheus-exporter \
    5. && rm -rf /tmp/*

    nginx开启监控需要在站点配置文件中增加网络入口

    1. location /nginx-status {
    2. stub_status;
    3. access_log off;
    4. allow 127.0.0.1;
    5. deny all;
    6. }

    修改启动脚本,在 9113 端口抓取nginx状态描述

    1. #!/bin/sh
    2. php-fpm -D
    3. nginx
    4. nohup php-fpm-exporter --addr="0.0.0.0:9190" --fastcgi="tcp://127.0.0.1:9000/php_status" &
    5. nginx-prometheus-exporter -nginx.scrape-uri=http://127.0.0.1/stub_status

    需要一个service公开9113端口给promethus查询

    1. - name: http-nginx-exporter
    2. protocol: TCP
    3. port: 9113
    4. targetPort: 9113

    配合prometheus抓取nginx-export的状态信息

    1. - job_name: 'nginx-exporter'
    2. scheme: http
    3. tls_config:
    4. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    5. insecure_skip_verify: true
    6. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    7. kubernetes_sd_configs:
    8. - role: pod
    9. relabel_configs:
    10. - action: labelmap
    11. regex: __meta_kubernetes_pod_label_(.+)
    12. - source_labels: [__meta_kubernetes_namespace]
    13. action: keep
    14. regex: .*project1.*
    15. - source_labels: [__meta_kubernetes_namespace]
    16. action: replace
    17. target_label: kubernetes_namespace
    18. - source_labels: [__meta_kubernetes_pod_ip]
    19. action: replace
    20. regex: (.+)
    21. target_label: __address__
    22. replacement: ${1}:9113

    nginx-exporter抓出来的内容

    1. # HELP nginx_connections_accepted Accepted client connections
    2. # TYPE nginx_connections_accepted counter
    3. nginx_connections_accepted 2
    4. # HELP nginx_connections_active Active client connections
    5. # TYPE nginx_connections_active gauge
    6. nginx_connections_active 1
    7. # HELP nginx_connections_handled Handled client connections
    8. # TYPE nginx_connections_handled counter
    9. nginx_connections_handled 2
    10. # HELP nginx_connections_reading Connections where NGINX is reading the request header
    11. # TYPE nginx_connections_reading gauge
    12. nginx_connections_reading 0
    13. # HELP nginx_connections_waiting Idle client connections
    14. # TYPE nginx_connections_waiting gauge
    15. nginx_connections_waiting 0
    16. # HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
    17. # TYPE nginx_connections_writing gauge
    18. nginx_connections_writing 1
    19. # HELP nginx_http_requests_total Total http requests
    20. # TYPE nginx_http_requests_total counter
    21. nginx_http_requests_total 23
    22. # HELP nginx_up Status of the last metric scrape
    23. # TYPE nginx_up gauge
    24. nginx_up 1
    25. # HELP nginxexporter_build_info Exporter build information
    26. # TYPE nginxexporter_build_info gauge
    27. nginxexporter_build_info{arch="linux/arm64",commit="e4a6810d4f0b776f7fde37fea1d84e4c7284b72a",date="2022-09-07T21:09:51Z",dirty="false",go="go1.19",version="0.11.0"} 1

    查询nginx接口调用量

    irate(nginx_http_requests_total{app="test-api"}[1m])

    查询使用中的连接数

    nginx_connections_active{app="test-api"}

     

    grafana制作可视化面板

    创建grafana的数据存储卷

    1. apiVersion: v1
    2. kind: PersistentVolumeClaim
    3. metadata:
    4. annotations:
    5. k8s.kuboard.cn/pvcType: Dynamic
    6. name: grafana
    7. namespace: promethues
    8. spec:
    9. accessModes:
    10. - ReadWriteOnce
    11. resources:
    12. requests:
    13. storage: 50Mi
    14. storageClassName: local-path
    15. volumeMode: Filesystem

    创建grafana deployment

    1. apiVersion: apps/v1
    2. kind: Deployment
    3. metadata:
    4. labels:
    5. k8s.kuboard.cn/layer: web
    6. k8s.kuboard.cn/name: grafana-k8s
    7. name: grafana-k8s
    8. namespace: promethues
    9. spec:
    10. selector:
    11. matchLabels:
    12. k8s.kuboard.cn/layer: web
    13. k8s.kuboard.cn/name: grafana-k8s
    14. template:
    15. metadata:
    16. labels:
    17. k8s.kuboard.cn/layer: web
    18. k8s.kuboard.cn/name: grafana-k8s
    19. spec:
    20. containers:
    21. - image: grafana/grafana
    22. imagePullPolicy: IfNotPresent
    23. name: grafana
    24. ports:
    25. - containerPort: 3000
    26. name: grafana
    27. protocol: TCP
    28. volumeMounts:
    29. - mountPath: /var/lib/grafana
    30. name: volume-62hxi
    31. volumes:
    32. - name: volume-62hxi
    33. persistentVolumeClaim:
    34. claimName: grafana

    创建grafana service

    1. apiVersion: v1
    2. kind: Service
    3. metadata:
    4. labels:
    5. k8s.kuboard.cn/layer: web
    6. k8s.kuboard.cn/name: grafana-k8s
    7. name: grafana-k8s
    8. namespace: promethues
    9. spec:
    10. ports:
    11. - name: ytfnyw
    12. nodePort: 31968
    13. port: 3000
    14. protocol: TCP
    15. targetPort: 3000
    16. selector:
    17. k8s.kuboard.cn/layer: web
    18. k8s.kuboard.cn/name: grafana-k8s
    19. type: NodePort

    可以打开 http://127.0.0.1:31968/login 访问grafana,登陆账号 admin 密码 admin

    grafana配置prometheus数据源

    点击配置 -> 数据源,选择 prometheus,数据源地址写 http://promethues-k8s:9090

     新增看板 new dashboard,选择 add new panel

    内存监控

    接口调用量监控

    处于等待状态phpfpm连接监控

     

    phpfpm数量监控

     面板总体效果


    ​​​​​​​

  • 相关阅读:
    虚拟机软件Parallels Desktop 19 mac功能介绍
    SpringBoot2.0---------------14、SpringBoot中Web原生组件注入
    具体项目下解决Echarts多端同步开发和维护的问题
    datafaker连Oracle造测试数据
    ROS2学习笔记:Launch脚本
    Mybatis快速入门
    Redis如何实现多可用区?
    删除自己在知乎的所有回答
    【LeetCode】29. 两数相除
    Three 之 three.js (webgl)鼠标/手指通过射线移动物体的简单整理封装
  • 原文地址:https://blog.csdn.net/fanghailiang2016/article/details/126796117