JVM启动参数:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/ -XX:+PrintGCDetails -Xloggc:/home/app/logs/web/gc.log -XX:+PrintGCDateStamps
1)、内存泄漏;
2)、进程所需的内存资源太大,对于java进程而言,除了-Xmx设置最大堆大小,还需要考虑元数据空间、堆外内存、直接内存的使用;
3、其他进程需要占用较多的资源,但是被OOM Killer机制选中当前进程;
Linux 内核有个机制叫OOM killer(Out Of Memory killer),该机制会监控那些占用内存过大,尤其是瞬间占用内存很快的进程,然后防止内存耗尽而自动把该进程杀掉。内核检测到系统内存不足、挑选并杀掉某个进程的过程可以参考内核源代码linux/mm/oom_kill.c,当系统内存不足的时候,out_of_memory()被触发,然后调用select_bad_process()选择一个”bad”进程杀掉。如何判断和选择一个”bad进程呢?linux选择”bad”进程是通过调用oom_badness(),挑选的算法和想法都很简单很朴实:最bad的那个进程就是那个最占用内存的进程。
参考:
Linux内核OOM killer机制_chirpyli的博客-CSDN博客_oomkiller
查看日志:
more /var/log/messages
egrep -i -r 'killed process' /var/log
dmesg -T
找到大概被kill的时间,查看有没有Out of Memory, Kill process xxx 的关键词
egrep -i -r 'killed process' /var/log
dmesg -T
- Sep 30 08:56:11 ecs-hn1-app-007 kernel: AliYunDun invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: AliYunDun cpuset=/ mems_allowed=0
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: CPU: 3 PID: 7859 Comm: AliYunDun Tainted: G OE ------------ T 3.10.0-1160.25.1.el7.x86_64 #1
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Call Trace:
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] dump_stack+0x19/0x1b - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] dump_header+0x90/0x229 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ? ktime_get_ts64+0x52/0xf0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ? delayacct_end+0x8f/0xb0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] oom_kill_process+0x2cd/0x490 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ? oom_unkillable_task+0xcd/0x120 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] out_of_memory+0x31a/0x500 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] __alloc_pages_slowpath+0x5db/0x729 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] __alloc_pages_nodemask+0x436/0x450 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] alloc_pages_current+0x98/0x110 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] __page_cache_alloc+0x97/0xb0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] filemap_fault+0x270/0x420 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ext4_filemap_fault+0x36/0x50 [ext4] - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] __do_fault.isra.61+0x8a/0x100 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ? put_prev_entity+0x31/0x400 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] do_read_fault.isra.63+0x4c/0x1b0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] handle_mm_fault+0xa20/0xfb0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] ? hrtimer_start_range_ns+0x1fd/0x3c0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] __do_page_fault+0x213/0x500 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] trace_do_page_fault+0x56/0x150 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] do_async_page_fault+0x22/0xf0 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: [
] async_page_fault+0x28/0x30 - Sep 30 08:56:12 ecs-hn1-app-007 kernel: Mem-Info:
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: active_anon:3848363 inactive_anon:154 isolated_anon:0#012 active_file:3374 inactive_file:6125 isolated_file:20#012 unevictable:0 dirty:22 writeback:16 unstable:0#012 slab_recl
- aimable:15257 slab_unreclaimable:7621#012 mapped:326 shmem:239 pagetables:9772 bounce:0#012 free:33797 free_pcp:0 free_cma:0
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB p
- resent:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
- free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 2830 15869 15869
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA32 free:64028kB min:12044kB low:15052kB high:18064kB active_anon:2770000kB inactive_anon:120kB active_file:4412kB inactive_file:7456kB unevictable:0kB isolated(anon)
- :0kB isolated(file):0kB present:3129216kB managed:2898784kB mlocked:0kB dirty:12kB writeback:0kB mapped:288kB shmem:180kB slab_reclaimable:14368kB slab_unreclaimable:4704kB kernel_stack:1776kB pagetables:6652kB unst
- able:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:18456 all_unreclaimable? yes
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 0 13038 13038
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 Normal free:55252kB min:55468kB low:69332kB high:83200kB active_anon:12623452kB inactive_anon:496kB active_file:9084kB inactive_file:17044kB unevictable:0kB isolated(an
- on):0kB isolated(file):80kB present:13631488kB managed:13351488kB mlocked:0kB dirty:76kB writeback:64kB mapped:1016kB shmem:776kB slab_reclaimable:46660kB slab_unreclaimable:25780kB kernel_stack:10192kB pagetables:3
- 2436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:43384 all_unreclaimable? yes
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: lowmem_reserve[]: 0 0 0 0
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 DMA32: 1026*4kB (UEM) 589*8kB (UEM) 885*16kB (UEM) 539*32kB (UEM) 235*64kB (UEM) 50*128kB (UEM) 6*256kB (EM) 2*512kB (EM) 0*1024kB 0*2048kB 0*4096kB = 64224kB
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 Normal: 5898*4kB (UEM) 1677*8kB (UEM) 1170*16kB (UEM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55728kB
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
- Sep 30 08:56:12 ecs-hn1-app-007 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: 9982 total pagecache pages
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: 0 pages in swap cache
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: Swap cache stats: add 0, delete 0, find 0/0
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: Free swap = 0kB
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: Total swap = 0kB
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: 4194174 pages RAM
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: 0 pages HighMem/MovableOnly
- ```
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: 127629 pages reserved
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 380] 0 380 35369 101 75 0 0 systemd-journal
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 404] 0 404 11332 135 23 0 -1000 systemd-udevd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 510] 0 510 13883 127 28 0 -1000 auditd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 562] 999 562 153089 2162 64 0 0 polkitd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 566] 81 566 14559 180 34 0 -900 dbus-daemon
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 578] 0 578 6702 214 18 0 0 systemd-logind
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 582] 998 582 29483 148 29 0 0 chronyd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 815] 0 815 25736 514 48 0 0 dhclient
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 884] 0 884 143570 3348 96 0 0 tuned
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1124] 0 1124 31605 180 17 0 0 crond
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1127] 0 1127 6477 50 18 0 0 atd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1131] 0 1131 27552 42 10 0 0 agetty
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 1132] 0 1132 27552 42 10 0 0 agetty
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [23747] 0 23747 185468 413 218 0 0 rsyslogd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [26575] 0 26575 28235 275 59 0 -1000 sshd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31369] 997 31369 19777 217 39 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31370] 997 31370 19777 309 39 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31371] 997 31371 20339 372 43 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31372] 997 31372 20339 372 43 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31373] 997 31373 20339 372 43 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [31374] 997 31374 20341 260 43 0 0 zabbix_agentd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [16737] 0 16737 187420 8093 67 0 -999 containerd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [20647] 0 20647 207923 15091 116 0 -500 dockerd
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19807] 0 19807 12241 383 20 0 0 ilogtail
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19808] 0 19808 102293 12131 88 0 0 ilogtail
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [10576] 0 10576 109338 260 29 0 0 AliSecGuard
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [22495] 0 22495 5970 90 16 0 0 argusagent
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [22497] 0 22497 280326 41489 136 0 0 /usr/local/clou
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [32483] 0 32483 1572349 658453 1464 0 0 java
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 9304] 0 9304 201546 950 13 0 0 aliyun-service
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 9471] 0 9471 4469 121 12 0 0 assist_daemon
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 5993] 0 5993 10614 378 22 0 0 AliYunDunUpdate
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 7826] 0 7826 34877 1708 68 0 0 AliYunDun
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [ 6340] 0 6340 3083814 2140635 4582 0 0 java
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: [19473] 0 19473 2193507 954577 2113 0 0 java
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: Out of memory: Kill process 6340 (java) score 511 or sacrifice child
- Sep 30 08:56:13 ecs-hn1-app-007 kernel: Killed process 6340 (java), UID 0, total-vm:12335256kB, anon-rss:8562504kB, file-rss:36kB, shmem-rss:0kB
- ```
避免oom killer的方案
1. 直接修改/proc//oom_score_adj文件,将其置为-1000
以前是通过/proc//oom_score来控制的,但近年来新版linux已经使用oom_score_adj来代替旧版的oom_score
参考:https://github.com/tinganho/linux-kernel/blob/master/Documentation/feature-removal-schedule.txt#L171
2. 直接关闭oom-killer
关闭
echo "0" > /proc/sys/vm/oom-kill激活
echo "1″ > /proc/sys/vm/oom-kill
jvm内存溢出,可添加启动参数,在发生故障的时候,产生dump文件
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/
通过Elicpse Memory Analyzer、JProfiler分析
JVM自身发生致命故障,生成hs_err_pid_xxx.log文件,该文件包含故障信息,默认保存在当前应用的启动目录,可通过jvm参数设置文件路径
-XX:ErrorFile=/var/log/hs_err_pid
.log
文件包含内容:
参考:
JVM致命错误日志(hs_err_pid.log)分析_51CTO博客_hs_err_pid日志分析
如何分析hs_err_pidxxx.log文件_BannerEva的博客-CSDN博客_hs_err_pid是什么文件
JVM致命错误日志(hs_err_pid.log)解读_江畔独步的博客-CSDN博客