书接上回,对EKS(AWS云k8s)启用AMP(AWS云Prometheus)监控+AMG(AWS云 grafana),上次我们只是配通了EKS+AMP+AMG的监控路径。这次使用一位大卫老师的grafana的面板,具体地址如下:
https://grafana.com/grafana/dashboards/15757-kubernetes-views-global/
为了想Prometheus暴露一些有用的性能指标,需要在k8s集群中,安装kube-state-metrics。
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
测试验证:
kubectl port-forward svc/kube-state-metrics -n kube-system 8080:8080
使用PromQL测试:
count(kube_pod_status_ready{condition="false"}) by (namespace, pod)
scrape_configs:
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-node-exporter prometheus-community/prometheus-node-exporter -n kube-system
测试:
export POD_NAME=$(kubectl get pods --namespace kube-system -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=prometheus-node-exporter" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward --namespace kube-system $POD_NAME 9100
scrape_configs:
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: replace
source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
global:
scrape_interval: 30s
# external_labels:
# clusterArn:
scrape_configs:
# pod metrics
- job_name: pod_exporter
kubernetes_sd_configs:
- role: pod
# container metrics
- job_name: cadvisor
scheme: https
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
# apiserver metrics
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
# kube proxy metrics
- job_name: kube-proxy
honor_labels: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'kube-system/kube-proxy.+'
- source_labels:
- __address__
action: replace
target_label: __address__
regex: (.+?)(\\:\\d+)?
replacement: $1:10249
# kube-state-metrics
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
# node-exporter
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: replace
source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
这里需要重新创建一个抓取程序。