• Kubernetes Kubelet 线程泄漏


    xx银行 20211102 操作记录

     

    孤儿pod
    巡检node上大量使用线程的pod
    查询对应的pod命令
    解决办法
    livenessProbe/readinessProbe 健康检查最佳实践  

     

     

    孤儿pod


    查看日志/var/log/messages发现有孤儿podorphaned pod存在

    1. Nov 2 14:45:56 bots-hrx-ksw7 kubelet: E1102 14:45:56.853519 6990 kubelet_volumes.go:154] orphaned pod "bfd406e1-b35a-41fe-9bbdfb6783747383" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
    2. Nov 2 14:45:58 bots-hrx-ksw7 kubelet: E1102 14:45:58.846728 6990 kubelet_volumes.go:154] orphaned pod "bfd406e1-b35a-41fe-9bbdfb6783747383" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
    3. Nov 2 14:46:00 bots-hrx-ksw7 kubelet: E1102 14:46:00.853502 6990 kubelet_volumes.go:154] orphaned pod "bfd406e1-b35a-41fe-9bbdfb6783747383" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
    4. Nov 2 14:46:02 bots-hrx-ksw7 kubelet: E1102 14:46:02.858094 6990 kubelet_volumes.go:154] orphaned pod "bfd406e1-b35a-41fe-9bbdfb6783747383" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see
    5. them.

    继续分析历史日志,发现这个pod是 ks-controller-manager 残留的历史文件

    1. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: I1006 08:00:19.623538 6990 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume
    2. started for volume "host-time" (UniqueName: "kubernetes.io/host-path/bfd406e1-b35a-41fe-9bbd-fb6783747383-host-time") pod "kscontroller-manager-756d467d5c-5x8dl" (UID: "bfd406e1-b35a-41fe-9bbd-fb6783747383")
    3. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: I1006 08:00:19.623602 6990 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume
    4. started for volume "webhook-secret" (UniqueName: "kubernetes.io/secret/bfd406e1-b35a-41fe-9bbd-fb6783747383-webhook-secret") pod
    5. "ks-controller-manager-756d467d5c-5x8dl" (UID: "bfd406e1-b35a-41fe-9bbd-fb6783747383")
    6. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: I1006 08:00:19.623623 6990 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume
    7. started for volume "kubesphere-config" (UniqueName: "kubernetes.io/configmap/bfd406e1-b35a-41fe-9bbd-fb6783747383-kubesphereconfig") pod "ks-controller-manager-756d467d5c-5x8dl" (UID: "bfd406e1-b35a-41fe-9bbd-fb6783747383")
    8. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: I1006 08:00:19.623643 6990 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume
    9. started for volume "kubesphere-token-49nzh" (UniqueName: "kubernetes.io/secret/bfd406e1-b35a-41fe-9bbd-fb6783747383-kubespheretoken-49nzh") pod "ks-controller-manager-756d467d5c-5x8dl" (UID: "bfd406e1-b35a-41fe-9bbd-fb6783747383")
    10. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods
    11. /bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/webhook-secret --scope -- mount -t tmpfs tmpfs /var/lib/kubelet
    12. /pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/webhook-secret
    13. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods
    14. /bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh --scope -- mount -t tmpfs tmpfs /var
    15. /lib/kubelet/pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh
    16. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: E1006 08:00:19.728563 6990 nestedpendingoperations.go:301] Operation for "{volumeName:
    17. kubernetes.io/secret/bfd406e1-b35a-41fe-9bbd-fb6783747383-webhook-secret podName:bfd406e1-b35a-41fe-9bbd-fb6783747383
    18. nodeName:}" failed. No retries permitted until 2021-10-06 08:00:20.226577433 +0800 CST m=+2154075.895598614 (durationBeforeRetry
    19. 500ms). Error: "MountVolume.SetUp failed for volume \"webhook-secret\" (UniqueName: \"kubernetes.io/secret/bfd406e1-b35a-41fe-9bbdfb6783747383-webhook-secret\") pod \"ks-controller-manager-756d467d5c-5x8dl\" (UID: \"bfd406e1-b35a-41fe-9bbd-fb6783747383\") :
    20. mount failed: fork/exec /usr/bin/systemd-run: resource temporarily unavailable\nMounting command: systemd-run\nMounting arguments:
    21. --description=Kubernetes transient mount for /var/lib/kubelet/pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.
    22. io~secret/webhook-secret --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes
    23. /kubernetes.io~secret/webhook-secret\nOutput: "
    24. Oct 6 08:00:19 bots-hrx-ksw7 kubelet: E1006 08:00:19.728615 6990 nestedpendingoperations.go:301] Operation for "{volumeName:
    25. kubernetes.io/secret/bfd406e1-b35a-41fe-9bbd-fb6783747383-kubesphere-token-49nzh podName:bfd406e1-b35a-41fe-9bbdfb6783747383 nodeName:}" failed. No retries permitted until 2021-10-06 08:00:20.228590839 +0800 CST m=+2154075.897612007
    26. (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"kubesphere-token-49nzh\" (UniqueName: \"kubernetes.io
    27. /secret/bfd406e1-b35a-41fe-9bbd-fb6783747383-kubesphere-token-49nzh\") pod \"ks-controller-manager-756d467d5c-5x8dl\" (UID: \"
    28. bfd406e1-b35a-41fe-9bbd-fb6783747383\") : mount failed: fork/exec /usr/bin/systemd-run: resource temporarily unavailable\nMounting
    29. command: systemd-run\nMounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/bfd406e1-b35a-41fe-
    30. 9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods
    31. /bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh\nOutput: "
    32. Oct 6 08:00:20 bots-hrx-ksw7 kubelet: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods
    33. /bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh --scope -- mount -t tmpfs tmpfs /var
    34. /lib/kubelet/pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/volumes/kubernetes.io~secret/kubesphere-token-49nzh
    解决办法
    rm -rf /var/lib/kubelet/pods/bfd406e1-b35a-41fe-9bbd-fb6783747383/

     

     

    巡检node上大量使用线程的pod


    ksw7上执行相关命令 

    1. [root@bots-hrx-ksw7 ~]# printf "NUM\t\PID\tCOMMAND\n" && ps -eLf | awk
    2. '{$1=null;$3=null;$4=null;$5=null;$6=null;$7=null;$8=null;$9=null;print}' | sort | uniq -c | sort -rn | head -10
    3. NUM \PID COMMAND
    4. 31628 15128 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.
    5. util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Duser.timezone=Asia/Shanghai -Dhippo.
    6. log.home=/usr/local/tomcat/logs -XX:HeapDumpPath=/wls/heapdump/ -Xms4096M -Xmx4096M -XX:MetaspaceSize=256m -XX:
    7. MaxMetaspaceSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Djdk.tls.ephemeralDHKeySize=2048 -
    8. Djava.protocol.handler.pkgs=org.apache.catalina.webresources -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
    9. -XX:MaxRAMFraction=2 -Djava.util.concurrent.ForkJoinPool.common.parallelism=2 -Dorg.apache.catalina.security.SecurityListener.
    10. UMASK=0027 -Xms4096M -Xmx4096M -Xmn384m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -XX:+UseConcMarkSweepGC -
    11. XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:
    12. +CMSClassUnloadingEnabled -XX:SurvivorRatio=8 -XX:-OmitStackTraceInFastThrow -Dfile.encoding=UTF-8 -XX:+PrintGC -XX:
    13. +PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/local/tomcat/logs/data-dsp-databus-758d79c7db-scxhd/gc.log -Dspring.profiles.
    14. active=standard -Duser.timezone=Asia/Shanghai -Dfile.encoding=UTF-8 -javaagent:/usr/local/tomcat/lib/jmx_prometheus_javaagent-0.13.0.
    15. jar=1234:/usr/local/tomcat/conf/prometheus.yaml -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat
    16. /bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.
    17. apache.catalina.startup.Bootstrap start
    18. 291 24645 /docker-java-home/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.
    19. logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Duser.timezone=Asia/Shanghai -Dhippo.log.
    20. home=/usr/local/tomcat/logs -XX:HeapDumpPath=/wls/heapdump/ -Xms2048M -Xmx2048M -XX:MetaspaceSize=512m -XX:
    21. MaxMetaspaceSize=512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Dfile.encoding=UTF-8 -Duser.
    22. timezone=GMT+08 -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.
    23. catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin
    24. /tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.
    25. catalina.startup.Bootstrap start
    26. 215 12781 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat
    27. /conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Duser.
    28. timezone=Asia/Shanghai -DAsyncLoggerConfig.RingBufferSize=1024 -Dlog4j2.AsyncQueueFullPolicy=Discard -Dlog4j2.
    29. configurationFactory=com.pingan.hippo.config.log4j2.HippoXMLConfigurationFactory -Dlog4j2.DiscardThreshold=ERROR -Dskywalking.
    30. agent.service_name=hmp-batch-service -Dskywalking.collector.backend_service=hippo-oap.bots-hrx-app.svc.cluster.local:11800 -javaagent:
    31. /wls/agent/agent/skywalking-agent.jar -Xloggc:/usr/local/tomcat/logs/hmp-batch-service-7cf4bc6868-vr82j/gc.log -XX:
    32. +CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGC -XX:
    33. +PrintGCDateStamps -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -
    34. Xms4096M -Xmx4096M -XX:MaxMetaspaceSize=386m -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.
    35. catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat
    36. /bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.
    37. tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
    38. 44 62942 /opt/jdk-12/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:
    39. +UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -
    40. Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.
    41. netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.
    42. jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-14584481903416358876 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -
    43. XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Djava.
    44. locale.providers=COMPAT -XX:UseAVX=2 -Des.cgroups.hierarchy.override=/ -Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m -Des.
    45. path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.flavor=oss -Des.distribution.
    46. type=docker -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch
    47. 36 37594 /opt/jdk-12/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:
    48. +UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -
    49. Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.
    50. netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.
    51. jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-18371654322352460532 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -
    52. XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Djava.
    53. locale.providers=COMPAT -XX:UseAVX=2 -Des.cgroups.hierarchy.override=/ -Djava.net.preferIPv4Stack=true -Xms1536m -Xmx1536m -Des.
    54. path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.flavor=oss -Des.distribution.
    55. type=docker -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch
    56. 33 14025 nginx: worker process
    57. 33 14024 nginx: worker process
    58. 33 14023 nginx: worker process
    59. 33 14022 nginx: worker process
    60. 33 14021 nginx: worker process
    可以看出pid=15128的进程使用了31628个线程,是第二名进程的100多倍

    ksw3上执行相关命令 

    1. [root@bots-hrx-ksw3 ~]# printf "NUM\t\PID\tCOMMAND\n" && ps -eLf | awk
    2. '{$1=null;$3=null;$4=null;$5=null;$6=null;$7=null;$8=null;$9=null;print}' | sort | uniq -c | sort -rn | head -10
    3. NUM \PID COMMAND
    4. 31888 32516 /usr/lib/jvm/java-1.8-openjdk/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.
    5. util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Duser.timezone=Asia/Shanghai -Dhippo.
    6. log.home=/usr/local/tomcat/logs -XX:HeapDumpPath=/wls/heapdump/ -Xms4096M -Xmx4096M -XX:MetaspaceSize=256m -XX:
    7. MaxMetaspaceSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Djdk.tls.ephemeralDHKeySize=2048 -
    8. Djava.protocol.handler.pkgs=org.apache.catalina.webresources -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
    9. -XX:MaxRAMFraction=2 -Djava.util.concurrent.ForkJoinPool.common.parallelism=2 -Dorg.apache.catalina.security.SecurityListener.
    10. UMASK=0027 -Xms4096M -Xmx4096M -Xmn384m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -XX:+UseConcMarkSweepGC -
    11. XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:
    12. +CMSClassUnloadingEnabled -XX:SurvivorRatio=8 -XX:-OmitStackTraceInFastThrow -Dfile.encoding=UTF-8 -XX:+PrintGC -XX:
    13. +PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/local/tomcat/logs/data-dsp-databus-758d79c7db-2nbxh/gc.log -Dspring.profiles.
    14. active=standard -Duser.timezone=Asia/Shanghai -Dfile.encoding=UTF-8 -javaagent:/usr/local/tomcat/lib/jmx_prometheus_javaagent-0.13.0.
    15. jar=1234:/usr/local/tomcat/conf/prometheus.yaml -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat
    16. /bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.
    17. apache.catalina.startup.Bootstrap start
    18. 153 10928 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat
    19. /conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dobs.bucket.public=hrx-bucket -Dobs.addr.
    20. idc=http://obs-cn-beijing-internal.cloud.papub -Dobs.addr.outer=https://hrx2.obs-cn-beijing.pinganyun.com -Dobs.addr.inner=http://obscn-beijing-internal.cloud.papub -Dobs.accessKey=QTVEQTE3MTFGOUU3NERGNEI5QzlDMUNGNkVGODY4MzE -Dobs.
    21. secretKey=QkMxQTFGMDM1NTBDNEEwOTk2RTkwMDVBNUM3QjkyQTM -Dspring.profiles.active=standard -Duser.timezone=Asia
    22. /Shanghai -Dhippo.log.home=/usr/local/tomcat/logs -Dlog4j2.configurationFactory=com.pingan.hippo.config.log4j2.
    23. HippoXMLConfigurationFactory -Dskywalking.agent.service_name=hbp-s3adapter -Dskywalking.collector.backend_service=hippo-oap.botshrx-app.svc.cluster.local:11800 -javaagent:/wls/agent/agent/skywalking-agent.jar -Xloggc:/usr/local/tomcat/logs/hbp-s3adapter-
    24. 5dfc68c957-4sgws/gc.log -Xmn384m -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:
    25. +HeapDumpOnOutOfMemoryError -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:
    26. SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:HeapDumpPath=/wls/heapdump/ -Xms4096M -Xmx4096M -XX:
    27. MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Dfile.
    28. encoding=UTF-8 -Duser.timezone=GMT+08 -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.
    29. webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin
    30. /bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=
    31. /usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
    32. 113 21687 java -jar /wls/hippoboot/apps/tag-center-web.jar
    33. 93 12677 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat
    34. /conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Dhippo.
    35. log.home=/usr/local/tomcat/logs -Xms2048M -Xmx2048M -Xmn512m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -
    36. DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.enable.threadlocals=true -
    37. DAsyncLoggerConfig.RingBufferSize=8192 -Dlog4j2.AsyncQueueFullPolicy=Discard -Dlog4j2.DiscardThreshold=ERROR -XX:
    38. +CMSParallelRemarkEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:SurvivorRatio=8 -XX:-
    39. OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/local/tomcat/logs/corehr-reportservice-7555c5c86d-g6zr6/gc.log -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -
    40. Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr
    41. /local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat
    42. /temp org.apache.catalina.startup.Bootstrap start
    43. 92 10612 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat
    44. /conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dspring.profiles.active=standard -Duser.
    45. timezone=Asia/Shanghai -Dhippo.log.home=/usr/local/tomcat/logs -XX:HeapDumpPath=/wls/heapdump/ -Xms2048M -Xmx2048M -XX:
    46. MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -Dfile.
    47. encoding=UTF-8 -Duser.timezone=GMT+08 -DAsyncLoggerConfig.RingBufferSize=1024 -Dlog4j2.AsyncQueueFullPolicy=Discard -Dlog4j2.
    48. configurationFactory=com.pingan.hippo.config.log4j2.HippoXMLConfigurationFactory -Dlog4j2.DiscardThreshold=ERROR -Dskywalking.
    49. agent.service_name=hbp-admin-web -Dskywalking.collector.backend_service=hippo-oap.bots-hrx-app.svc.cluster.local:11800 -javaagent:
    50. /wls/agent/agent/skywalking-agent.jar -Xloggc:/usr/local/tomcat/logs/hbp-admin-web-5b985459-zhm48/gc.log -XX:
    51. +CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGC -XX:
    52. +PrintGCDateStamps -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:
    53. MaxMetaspaceSize=386m -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.
    54. apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local
    55. /tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp
    56. org.apache.catalina.startup.Bootstrap start
    57. 40 15198 /data/jdk1.8.0_181/bin/java -Xmx4096m -XX:MaxNewSize=512m -jar /data/bin/newresume-0.0.1-SNAPSHOT.jar
    58. 37 15202 /data/jdk1.8.0_181/bin/java -jar -Dspring.profiles.active=dev168 gw-router-lite-1.0.1.jar
    59. 33 15200 /data/jdk1.8.0_181/bin/java -Xmx4096m -XX:MaxNewSize=512m -jar /data/bin/tika-server-1.24.1.jar
    60. 31 7410 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
    61. 27 8953 /home/weave/scope --mode=probe --probe-only --probe.processes=false --probe.kubernetes.role=host --probe.publish.
    62. interval=4500ms --probe.spy.interval=3s --probe.docker.bridge=docker0 --probe.docker=true --weave=false weave-scope-app.weave:80
    可以看出pid=32516的进程使用了31888个线程

    查询对应的pod命令

    以下在模拟环境执行,其中pid=2448假设为使用线程数最多的那个进程
    1. [root@i-9kergf6v ~]# nsenter -n -t 2448 ip a
    2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    3. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    4. inet 127.0.0.1/8 scope host lo
    5. valid_lft forever preferred_lft forever
    6. 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    7. link/ipip 0.0.0.0 brd 0.0.0.0
    8. 4: eth0@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    9. link/ether a2:71:4f:fb:45:27 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    10. inet 10.233.86.25/32 brd 10.233.86.25 scope global eth0
    11. valid_lft forever preferred_lft forever
    查看得到这个pod的ip为10.233.86.25,进入对应pod页面查看相应的pod
    解决办法
    修改kubelet参数,对每个pod进行pid数量限制
    删除对应pod,使其重新部署

     

     

     

    livenessProbe/readinessProbe 健康检查最佳实践


    kubesphere本身提供可视化的方式配置 livenessProbe/readinessProbe 相关链接: https://v3-1.docs.kubesphere.io/zh/docs/project-user-guide

    /application-workloads/container-image-settings/#%E5%81%A5%E5%BA%B7%E6%A3%80%E6%9F%A5%E5%99%A8

     

    如果需要相关yaml配置方式,可以在右上角切换到编辑模式,查看到对应的yaml配置
    20211102 操作记录
    在每个worker node上执行下面的操作,然后重启业务pod
    1. sysctl -w kernel.pid_max=65536
    2. vim /var/lib/kubelet/config.yaml
    3. podPidsLimit: 10000
    4. evictionHard:
    5. memory.available: 5%
    6. pid.available: 10%
    7. service kubelet status
    8. service kubelet restart
    9. service kubelet status
    后续巡检需要用到的命令
    1. # 在pod上查看使用线程数较多的进程
    2. printf "NUM\tPID\tCOMMAND\n" && ps -eLf | awk '{$1=null;$3=null;$4=null;$5=null;$6=null;$7=null;$8=null;$9=null;print}' | sort | uniq -c
    3. | sort -rn | head -10
  • 相关阅读:
    SparkSQL--介绍
    Java--MyBatis传入参数parameterType
    太强了,用Excel玩机器学习
    微信小程序(基本结构)
    跟我学C++中级篇——右值引用和万能引用
    vue组件
    设计模式之代理模式
    react-native实践日记--5.ReactNative 项目版本升级,0.61到0.72升级的问题记录(一)
    重建3D结构方式 | 显式重建与隐式重建(Implicit Reconstruction)
    Python基础学习019--跳过
  • 原文地址:https://blog.csdn.net/qq_34556414/article/details/126420611