《OpenShift 4.x HOL教程汇总》
本文在 OpenShift 4.13 环境中进行验证。
在 OpenShift 中自带 VPA(垂直自动扩展) 和 HPA (水平自动扩展)功能,可通过动态更改 Pod 的资源分配数量或更改 Pod 的数量来满足应用的弹性负载变化。不过 HPA 和 VPA 主要是以 Pod 消耗的 CPU 和内存量判断是否需要对 Pod 进行扩展,因此使用上有一定局限。而 OpenShift 的 Custom Metrics Autoscaler 提供了可完全定制的扩展指标和扩展架构,能够实现更加灵活的容器弹性运行。
OpenShift 的 Custom Metrics Autoscaler 是基于 CNCF 沙箱项目 KEDA 实现的。它本质上是基于 HPA 进行扩展的,但可以结合定制的运行指标判断是否需要扩展。
Custom Metrics Autoscaler 是通过 Operator 安装运行的。在运行前,我们需要把扩展的对象(主要是 Deployment 对象)、定制的指标、以及何时触发扩展等保存到 ScaledObject 中。在运行时,监控体系要能定期获得这些定制指标的运行情况,在 OpenShift 中我们可以使用 Prometheus 实现应用监控。然后 Custom Metrics Autoscaler 会根据获得定制指标的运行情况来判断是否需要对 Deployment 进行扩展。

apiVersion: keda.sh/v1alpha1
kind: KedaController
metadata:
name: keda
namespace: openshift-keda
spec:
operator:
logLevel: info
logEncoder: console
metricsServer:
logLevel: '0'
serviceAccount: {}
watchNamespace: ''
$ oc get deployment -n openshift-keda
NAME READY UP-TO-DATE AVAILABLE AGE
custom-metrics-autoscaler-operator 1/1 1 1 5m
keda-metrics-apiserver 1/1 1 1 3m
keda-operator 1/1 1 1 3m
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
$ oc get pod -n openshift-user-workload-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-79dc5458f7-vz98q 2/2 Running 0 102s
prometheus-user-workload-0 6/6 Running 0 99s
thanos-ruler-user-workload-0 3/3 Running 0 95s
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test-app
name: test-app
spec:
replicas: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
type: keda-testing
spec:
containers:
- name: prom-test-app
image: quay.io/zroubalik/prometheus-app:latest
imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
labels:
app: test-app
annotations:
prometheus.io/scrape: "true"
name: test-app
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
type: keda-testing
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: keda-testing-sm
spec:
endpoints:
- scheme: http
port: http
namespaceSelector: {}
selector:
matchLabels:
app: test-app
$ oc get deploy test-app
NAME READY UP-TO-DATE AVAILABLE AGE
test-app 1/1 1 1 94s
$ oc create serviceaccount thanos
$ oc describe serviceaccount thanos
Name: thanos
Namespace: test
Labels: <none>
Annotations: <none>
Image pull secrets: thanos-dockercfg-zbh7g
Mountable secrets: thanos-token-nmqpv
thanos-dockercfg-zbh7g
Tokens: thanos-token-gjprx
thanos-token-nmqpv
Events: <none>
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-prometheus
spec:
secretTargetRef:
- parameter: bearerToken
name: thanos-token-gjprx # update this
key: token
- parameter: ca
name: thanos-token-gjprx # update this
key: ca.crt
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: thanos-metrics-reader
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
$ oc adm policy add-role-to-user thanos-metrics-reader -z thanos --role-namespace=test
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
spec:
scaleTargetRef:
name: test-app
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 5
cooldownPeriod: 10
triggers:
- type: prometheus
metadata:
serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092
namespace: test # replace
metricName: http_requests_total
threshold: '5'
query: sum(rate(http_requests_total{job="test-app"}[1m]))
authModes: "bearer"
authenticationRef:
name: keda-trigger-auth-prometheus
$ oc get scaledobject prometheus-scaledobject -n test
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
prometheus-scaledobject apps/v1.Deployment test-app 1 10 prometheus keda-trigger-auth-prometheus True False False 9m
apiVersion: batch/v1
kind: Job
metadata:
generateName: generate-requests-
spec:
template:
spec:
containers:
- image: quay.io/zroubalik/hey
name: test
command: ["/bin/sh"]
args: ["-c", "for i in $(seq 1 30);do echo $i;/hey -c 5 -n 100 http://test-app.test.svc;sleep 1;done"] # replace
restartPolicy: Never
activeDeadlineSeconds: 120
backoffLimit: 2
$ watch oc get deployment test-app

https://cloud.redhat.com/blog/custom-metrics-autoscaler-on-openshift
https://rhthsa.github.io/openshift-demo/KEDA.html
https://www.opensourcerers.org/2022/02/14/enabling-monitoring-scaling-alerting/