• Java进程退出


    JVM启动参数:

    -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/ -XX:+PrintGCDetails -Xloggc:/home/app/logs/web/gc.log -XX:+PrintGCDateStamps

     

    1、被Linux杀死

    1)、内存泄漏;
    2)、进程所需的内存资源太大,对于java进程而言,除了-Xmx设置最大堆大小,还需要考虑元数据空间、堆外内存、直接内存的使用;

    3、其他进程需要占用较多的资源,但是被OOM Killer机制选中当前进程;

    OOM Killer机制:

    Linux 内核有个机制叫OOM killer(Out Of Memory killer),该机制会监控那些占用内存过大,尤其是瞬间占用内存很快的进程,然后防止内存耗尽而自动把该进程杀掉。内核检测到系统内存不足、挑选并杀掉某个进程的过程可以参考内核源代码linux/mm/oom_kill.c,当系统内存不足的时候,out_of_memory()被触发,然后调用select_bad_process()选择一个”bad”进程杀掉。如何判断和选择一个”bad进程呢?linux选择”bad”进程是通过调用oom_badness(),挑选的算法和想法都很简单很朴实:最bad的那个进程就是那个最占用内存的进程。

    参考:

    Linux内核OOM killer机制_chirpyli的博客-CSDN博客_oomkiller

    OOM Killer机制分析 - 知乎

    出现OOM Killer的原因与解决方案-阿里云

    查看日志

    more /var/log/messages

    egrep -i -r 'killed process' /var/log

    dmesg -T

    找到大概被kill的时间,查看有没有Out of Memory, Kill process xxx 的关键词

    egrep -i -r 'killed process' /var/log

    dmesg -T

    1. Sep 30 08:56:11 ecs-hn1-app-007 kernel: AliYunDun invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
    2. Sep 30 08:56:12 ecs-hn1-app-007 kernel: AliYunDun cpuset=/ mems_allowed=0
    3. Sep 30 08:56:12 ecs-hn1-app-007 kernel: CPU: 3 PID: 7859 Comm: AliYunDun Tainted: G OE ------------ T 3.10.0-1160.25.1.el7.x86_64 #1
    4. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
    5. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Call Trace:
    6. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] dump_stack+0x19/0x1b
    7. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] dump_header+0x90/0x229
    8. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ? ktime_get_ts64+0x52/0xf0
    9. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ? delayacct_end+0x8f/0xb0
    10. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] oom_kill_process+0x2cd/0x490
    11. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ? oom_unkillable_task+0xcd/0x120
    12. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] out_of_memory+0x31a/0x500
    13. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] __alloc_pages_slowpath+0x5db/0x729
    14. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] __alloc_pages_nodemask+0x436/0x450
    15. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] alloc_pages_current+0x98/0x110
    16. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] __page_cache_alloc+0x97/0xb0
    17. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] filemap_fault+0x270/0x420
    18. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ext4_filemap_fault+0x36/0x50 [ext4]
    19. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] __do_fault.isra.61+0x8a/0x100
    20. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ? put_prev_entity+0x31/0x400
    21. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] do_read_fault.isra.63+0x4c/0x1b0
    22. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] handle_mm_fault+0xa20/0xfb0
    23. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] ? hrtimer_start_range_ns+0x1fd/0x3c0
    24. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] __do_page_fault+0x213/0x500
    25. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] trace_do_page_fault+0x56/0x150
    26. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] do_async_page_fault+0x22/0xf0
    27. Sep 30 08:56:12 ecs-hn1-app-007 kernel: [] async_page_fault+0x28/0x30
    28. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Mem-Info:
    29. Sep 30 08:56:12 ecs-hn1-app-007 kernel: active_anon:3848363 inactive_anon:154 isolated_anon:0#012 active_file:3374 inactive_file:6125 isolated_file:20#012 unevictable:0 dirty:22 writeback:16 unstable:0#012 slab_recl
    30. aimable:15257 slab_unreclaimable:7621#012 mapped:326 shmem:239 pagetables:9772 bounce:0#012 free:33797 free_pcp:0 free_cma:0
    31. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB p
    32. resent:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
    33. free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    34. Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 2830 15869 15869
    35. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA32 free:64028kB min:12044kB low:15052kB high:18064kB active_anon:2770000kB inactive_anon:120kB active_file:4412kB inactive_file:7456kB unevictable:0kB isolated(anon)
    36. :0kB isolated(file):0kB present:3129216kB managed:2898784kB mlocked:0kB dirty:12kB writeback:0kB mapped:288kB shmem:180kB slab_reclaimable:14368kB slab_unreclaimable:4704kB kernel_stack:1776kB pagetables:6652kB unst
    37. able:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:18456 all_unreclaimable? yes
    38. Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 0 13038 13038
    39. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 Normal free:55252kB min:55468kB low:69332kB high:83200kB active_anon:12623452kB inactive_anon:496kB active_file:9084kB inactive_file:17044kB unevictable:0kB isolated(an
    40. on):0kB isolated(file):80kB present:13631488kB managed:13351488kB mlocked:0kB dirty:76kB writeback:64kB mapped:1016kB shmem:776kB slab_reclaimable:46660kB slab_unreclaimable:25780kB kernel_stack:10192kB pagetables:3
    41. 2436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:43384 all_unreclaimable? yes
    42. Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 0 0 0
    43. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
    44. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA32: 1026*4kB (UEM) 589*8kB (UEM) 885*16kB (UEM) 539*32kB (UEM) 235*64kB (UEM) 50*128kB (UEM) 6*256kB (EM) 2*512kB (EM) 0*1024kB 0*2048kB 0*4096kB = 64224kB
    45. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 Normal: 5898*4kB (UEM) 1677*8kB (UEM) 1170*16kB (UEM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55728kB
    46. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    47. Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    48. Sep 30 08:56:13 ecs-hn1-app-007 kernel: 9982 total pagecache pages
    49. Sep 30 08:56:13 ecs-hn1-app-007 kernel: 0 pages in swap cache
    50. Sep 30 08:56:13 ecs-hn1-app-007 kernel: Swap cache stats: add 0, delete 0, find 0/0
    51. Sep 30 08:56:13 ecs-hn1-app-007 kernel: Free swap = 0kB
    52. Sep 30 08:56:13 ecs-hn1-app-007 kernel: Total swap = 0kB
    53. Sep 30 08:56:13 ecs-hn1-app-007 kernel: 4194174 pages RAM
    54. Sep 30 08:56:13 ecs-hn1-app-007 kernel: 0 pages HighMem/MovableOnly
    55. ```
    56. Sep 30 08:56:13 ecs-hn1-app-007 kernel: 127629 pages reserved
    57. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
    58. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 380] 0 380 35369 101 75 0 0 systemd-journal
    59. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 404] 0 404 11332 135 23 0 -1000 systemd-udevd
    60. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 510] 0 510 13883 127 28 0 -1000 auditd
    61. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 562] 999 562 153089 2162 64 0 0 polkitd
    62. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 566] 81 566 14559 180 34 0 -900 dbus-daemon
    63. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 578] 0 578 6702 214 18 0 0 systemd-logind
    64. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 582] 998 582 29483 148 29 0 0 chronyd
    65. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 815] 0 815 25736 514 48 0 0 dhclient
    66. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 884] 0 884 143570 3348 96 0 0 tuned
    67. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1124] 0 1124 31605 180 17 0 0 crond
    68. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1127] 0 1127 6477 50 18 0 0 atd
    69. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1131] 0 1131 27552 42 10 0 0 agetty
    70. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1132] 0 1132 27552 42 10 0 0 agetty
    71. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [23747] 0 23747 185468 413 218 0 0 rsyslogd
    72. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [26575] 0 26575 28235 275 59 0 -1000 sshd
    73. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31369] 997 31369 19777 217 39 0 0 zabbix_agentd
    74. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31370] 997 31370 19777 309 39 0 0 zabbix_agentd
    75. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31371] 997 31371 20339 372 43 0 0 zabbix_agentd
    76. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31372] 997 31372 20339 372 43 0 0 zabbix_agentd
    77. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31373] 997 31373 20339 372 43 0 0 zabbix_agentd
    78. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31374] 997 31374 20341 260 43 0 0 zabbix_agentd
    79. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [16737] 0 16737 187420 8093 67 0 -999 containerd
    80. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [20647] 0 20647 207923 15091 116 0 -500 dockerd
    81. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19807] 0 19807 12241 383 20 0 0 ilogtail
    82. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19808] 0 19808 102293 12131 88 0 0 ilogtail
    83. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [10576] 0 10576 109338 260 29 0 0 AliSecGuard
    84. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [22495] 0 22495 5970 90 16 0 0 argusagent
    85. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [22497] 0 22497 280326 41489 136 0 0 /usr/local/clou
    86. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [32483] 0 32483 1572349 658453 1464 0 0 java
    87. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 9304] 0 9304 201546 950 13 0 0 aliyun-service
    88. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 9471] 0 9471 4469 121 12 0 0 assist_daemon
    89. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 5993] 0 5993 10614 378 22 0 0 AliYunDunUpdate
    90. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 7826] 0 7826 34877 1708 68 0 0 AliYunDun
    91. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 6340] 0 6340 3083814 2140635 4582 0 0 java
    92. Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19473] 0 19473 2193507 954577 2113 0 0 java
    93. Sep 30 08:56:13 ecs-hn1-app-007 kernel: Out of memory: Kill process 6340 (java) score 511 or sacrifice child
    94. Sep 30 08:56:13 ecs-hn1-app-007 kernel: Killed process 6340 (java), UID 0, total-vm:12335256kB, anon-rss:8562504kB, file-rss:36kB, shmem-rss:0kB
    95. ```

    怎么避免OOM Killer误杀我的业务进程?


    避免oom killer的方案
    1. 直接修改/proc//oom_score_adj文件,将其置为-1000
    以前是通过/proc//oom_score来控制的,但近年来新版linux已经使用oom_score_adj来代替旧版的oom_score

    参考:https://github.com/tinganho/linux-kernel/blob/master/Documentation/feature-removal-schedule.txt#L171

    2. 直接关闭oom-killer

    关闭
    echo "0" > /proc/sys/vm/oom-kill

    激活

    echo "1″ > /proc/sys/vm/oom-kill

    JVM的OOM

    jvm内存溢出,可添加启动参数,在发生故障的时候,产生dump文件

    -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/

    通过Elicpse Memory Analyzer、JProfiler分析

    JVM的故障

    JVM自身发生致命故障,生成hs_err_pid_xxx.log文件,该文件包含故障信息,默认保存在当前应用的启动目录,可通过jvm参数设置文件路径

    -XX:ErrorFile=/var/log/hs_err_pid.log

    文件包含内容:

    • 日志头文件
    • 导致crash的线程信息
    • 所有线程信息
    • 安全点和锁信息
    • 堆信息
    • 本地代码缓存
    • 编译事件
    • gc相关记录
    • jvm内存映射
    • jvm启动参数
    • 服务器信息

    参考:

    JVM致命错误日志(hs_err_pid.log)分析_51CTO博客_hs_err_pid日志分析

    如何分析hs_err_pidxxx.log文件_BannerEva的博客-CSDN博客_hs_err_pid是什么文件

    JVM致命错误日志(hs_err_pid.log)解读_江畔独步的博客-CSDN博客

    参考文档:

    哪些原因会导致JAVA进程退出?

  • 相关阅读:
    Typora+PicGO+腾讯云COS做图床
    DolphinDB 历史数据回放功能应用:股票行情回放
    用信号量实现进程同步与互斥(含代码分析)
    python 爬取文章并保存为pdf
    关于匿名内部类
    Serverless Devs 进入 CNCF 沙箱,成首个入选的 Serverless 工具项目
    Spring 学习(五)——JavaConfig 实现配置
    交换机堆叠 配置(H3C)堆叠中一台故障如何替换
    数模电路基础知识 —— 7. PN结与二极管的工作原理
    【情态动词练习题】will 与 would
  • 原文地址:https://blog.csdn.net/u013071311/article/details/127118615