参考 解读内核 sysctl 配置中 panic、oops 相关项目 这篇博客来了解与内核 panic、oops 相关的配置项目。
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
static struct task_struct **pthread;
static int kthread_loop(void *cpu)
{
while(!kthread_should_stop()) {
int i, j;
long long result;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
result += i * j;
}
return 0;
}
static void percpu_thread_stop(void)
{
int cpu = 0;
for_each_cpu(cpu, cpu_online_mask) {
if (pthread[cpu] == NULL) {
continue;
}
kthread_stop(pthread[cpu]);
}
kfree(pthread);
}
static int __init cpu_full_init(void)
{
int cpu = 0;
size_t length = nr_cpu_ids * sizeof(void *);
pthread = kmalloc(length, GFP_KERNEL);
if (!pthread) {
printk("failed to create pthread struct\n");
return -1;
}
memset(pthread, 0x00, length);
for_each_cpu(cpu, cpu_online_mask) {
pthread[cpu] = kthread_create(kthread_loop,
(void *)NULL, "loop_%d", cpu);
if (pthread[cpu] == NULL) {
goto err;
}
kthread_bind(pthread[cpu], cpu);
wake_up_process(pthread[cpu]);
}
return 0;
err:
percpu_thread_stop();
return -EINVAL;
}
static void __exit cpu_full_exit(void)
{
percpu_thread_stop();
}
module_init(cpu_full_init);
module_exit(cpu_full_exit);
MODULE_AUTHOR("longyu");
MODULE_LICENSE("GPL");
工作原理:在模块初始化的时候遍历每一个在线的 cpu 逻辑核,为每个逻辑核创建一个持续执行的内核线程并对线程运行的 CPU 核进行 bind,最后唤醒线程开始执行。
将上述源码保存为 cpu_usage_panic.c。
openEuler 版本:openEuler release 20.03 (LTS-SP2)
内核头文件安装命令:yum install kernel-devel.x86_64
实例 demo 编译 Makfile:
MYPROC = cpu_usage_panic
obj-m += cpu_usage_panic.o
allofit: modules
KROOT="/usr/src/kernels/4.19.90-2204.4.0.0146.oe1.x86_64/"
modules:
@${MAKE} -C ${KROOT} M=${PWD} modules
modules_install:
@$(MAKE) -C $(KROOT) M=$(PWD) modules_install
kernel_clean:
@$(MAKE) -C $(KROOT) M=$(PWD) clean
clean: kernel_clean
rm -rf Module.symvers modules.order
insert: modules
sudo dmesg -c
sudo insmod ${MYPROC}.ko
remove: clean
sudo rmmod ${MYPROC}
kernel.hardlockup_panic = 1
kernel.hung_task_panic = 0
kernel.panic = 0
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 1
kernel.panic_on_rcu_stall = 0
kernel.panic_on_stackoverflow = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_warn = 0
kernel.panic_print = 0
kernel.softlockup_panic = 0
kernel.unknown_nmi_panic = 0
vm.panic_on_oom = 0
如上配置中,kernel.softlockup_panic 配置为 0,则当内核检测到 softlockup 的时候不会触发 panic。同时 panic_on_oops 为 1 表明系统将会在一个 oops、BUG 产生时立刻触发 panic。
首先调高内核打印级别:
echo 7 4 1 7 > /proc/sys/kernel/printk
加载模块后,串口持续打印如下 BUG 信息且不会自动重启:
[ 569.491631] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [loop_0:2138]
..............................................................................
[ 569.500646] CPU: 0 PID: 2138 Comm: loop_0 Kdump: loaded Tainted: G OEL 4.19.90-2106.3.0.0095.oe1.x86_64 #1
[ 569.501676] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 569.502480] RIP: 0010:kthread_should_stop+0x0/0x40
......................................................
[ 569.510960] Call Trace:
[ 569.511207] kthread_loop+0xa/0x11 [cpu_usage_panic]
[ 569.511688] ? kthread+0x113/0x130
[ 569.512022] ? kthread_create_worker_on_cpu+0x70/0x70
[ 569.512511] ? ret_from_fork+0x35/0x40
上述 oops 信息的第一行表明问题为 watchdog 检查到 CPU 0 上发生了 soft lockup 事件,此事件触发后内核的行为首先由 kernel.softlockup_panic 配置决定,其值为 0,不会触发 panic,于是串口持续输出异常信息。
在这种情况下只能强制杀死虚拟机来恢复!
执行如下命令开启 kernel.softlockup_panic 配置:
[root@openeuler]# sysctl -w kernel.softlockup_panic=1
kernel.softlockup_panic = 1
重新加载模块,此时串口会打印如下信息然后系统自动重启:
[ 209.515206] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [loop_0:2863]
..............................................................................
[ 209.529837] CPU: 0 PID: 2863 Comm: loop_0 Kdump: loaded Tainted: G OE 4.19.90-2106.3.0.0095.oe1.x86_64 #1
[ 209.531595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 209.532950] RIP: 0010:kthread_should_stop+0x0/0x40
..............................................................................
[ 209.548442] Call Trace:
[ 209.548837] kthread_loop+0x14/0x1b [cpu_usage_panic]
[ 209.549649] ? kthread+0x113/0x130
[ 209.550278] ? kthread_create_worker_on_cpu+0x70/0x70
[ 209.551149] ? ret_from_fork+0x35/0x40
[ 209.551785] Kernel panic - not syncing: softlockup: hung tasks
[ 209.552715] CPU: 0 PID: 2863 Comm: loop_0 Kdump: loaded Tainted: G OEL 4.19.90-2106.3.0.0095.oe1.x86_64 #1
[ 209.554470] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 209.555772] Call Trace:
[ 209.556138] <IRQ>
[ 209.556454] dump_stack+0x66/0x8b
[ 209.556953] panic+0xe4/0x292
[ 209.557310] watchdog_timer_fn+0x25b/0x270
[ 209.557824] ? softlockup_fn+0x40/0x40
[ 209.558297] __hrtimer_run_queues+0x108/0x290
[ 209.558895] hrtimer_interrupt+0xe5/0x240
[ 209.559355] ? kvm_sched_clock_read+0xd/0x20
[ 209.559989] smp_apic_timer_interrupt+0x6a/0x130
[ 209.560680] apic_timer_interrupt+0xf/0x20
[ 209.561365] </IRQ>
..................................................................
[ 0.000000] Linux version 4.19.90-2106.3.0.0095.oe1.x86_64 (abuild@ecs-obsworker-0015) (gcc version 7.3.0 (GCC)) #1 SMP Wed Jun 23 15:18:59 UTC 2021
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.90-2106.3.0.0095.oe1.x86_64 ro resume=/dev/mapper/openeuler-swap rhgb console=ttyS0,115200 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd disable_cpu_apicid=0 elfcorehdr=851316K
oops 第一行与上文描述一样,在第一个 oops 结束后,内核触发了一个 panic,panic 的梗概信息如下:
[ 209.551785] Kernel panic - not syncing: softlockup: hung tasks
Kernel panic 表明这是一个内核 panic 事件,softlockup 表明 panic 由 softlockup 事件触发,softlockup 的直接原因是有 hung 住的任务。
从 panic 的堆栈中可以看出,此时 cpu 正在处理定时器中断,在运行 hrtimer 定时器上的事件队列时,执行到 softlockup_fn 进而调用到 watchdog_timer_fn,此时检测到 soft lockup 事件从而触发异常。
同时注意到重启过程中有如下 kdump 打印信息:
kdump: dump target /dev/mapper/openeuler-root is not mounted, trying to mount...
[ 4.987404] EXT4-fs (dm-0): recovery complete
[ 4.989475] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
kdump: saving to /sysroot//var/crash/127.0.0.1-2022-08-29-07:33:46/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data : [100.0 %] / eta: 0s
[ 5.239421] rngd[191]: Enabling JITTER rng support
kdump: saving vmcore complete
[ 5.273260] systemd[1]: sysroot.mount: Succeeded.
[ 5.292700] systemd[1]: Shutting down.
.........................................................................................
[ 5.453897] reboot: Restarting system
[ 5.454316] reboot: machine restart
当 kdump 生成内核 vmcore 文件后,主动重启,在进入系统后可以使用生成的 vmcore 文件来定位问题。
系统重启后,重新登入系统,进入 /var/crash/ 目录中,能够看到相应时间生成的 vmcore 文件的子目录,子目录内的内容如下:
[root@openeuler 127.0.0.1-2022-08-29-07:33:46]# file *
vmcore: Kdump compressed dump v6, system Linux, node openeuler, release 4.19.90-2106.3.0.0095.oe1.x86_64, version #1 SMP Wed Jun 23 15:18:59 UTC 2021, machine x86_64, domain (none)
vmcore-dmesg.txt: JSON data
vmcore-dmesg.txt 存储内核崩溃时的 dmesg 信息,vmcore 存储压缩后的内核转储文件。要使用 vmcore,需要先安装如下依赖:
yum install crash
yum install kernel-debuginfo
安装完成后,执行如下命令启动 crash 来调试:
[root@openeuler longyu]# crash /usr/lib/debug/lib/modules/4.19.90-2204.4.0.0146.oe1.x86_64/vmlinux /var/crash/127.0.0.1-2022-08-29-07\:33\:46/vmcore
crash 7.2.8-3.oe1
..................................................................................................
WARNING: kernel relocated [762MB]: patching 91709 gdb minimal_symbol values
crash: invalid kernel virtual address: ffffffffb1e673a0 type: "possible"
WARNING: cannot read cpu_possible_map
crash: invalid kernel virtual address: ffffffffb1e66ba0 type: "present"
WARNING: cannot read cpu_present_map
crash: invalid kernel virtual address: ffffffffb1e66fa0 type: "online"
WARNING: cannot read cpu_online_map
crash: invalid kernel virtual address: ffffffffb1e667a0 type: "active"
WARNING: cannot read cpu_active_map
crash: invalid kernel virtual address: ffffffffb1c2ef80 type: "pv_init_ops"
crash: invalid kernel virtual address: ffffffffb1c12544 type: "init_uts_ns"
WARNING: invalid linux_banner pointer: ffffffffb1800180
crash: /usr/lib/debug/lib/modules/4.19.90-2204.4.0.0146.oe1.x86_64/vmlinux and /var/crash/127.0.0.1-2022-08-29-07:33:46/vmcore do not match!
最后一行打印 vmlinux 与 vmcore 文件不匹配,排查了下发现我需要更新下内核,安装内核后重新执行上述操作,此时 crash 能够正常工作。
记录内容如下:
KERNEL: /usr/lib/debug/lib/modules/4.19.90-2204.4.0.0146.oe1.x86_64/vmlinux
DUMPFILE: ./vmcore [PARTIAL DUMP]
CPUS: 1
DATE: Mon Aug 29 08:36:59 2022
UPTIME: 00:01:57
LOAD AVERAGE: 5.25, 1.35, 0.46
TASKS: 127
NODENAME: openeuler
RELEASE: 4.19.90-2204.4.0.0146.oe1.x86_64
VERSION: #1 SMP Wed Apr 27 16:04:34 UTC 2022
MACHINE: x86_64 (1799 Mhz)
MEMORY: 1 GB
PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
PID: 5134
COMMAND: "loop_0"
TASK: ffff9f5803cec680 [THREAD_INFO: ffff9f5803cec680]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 5134 TASK: ffff9f5803cec680 CPU: 0 COMMAND: "loop_0"
#0 [ffff9f583ea03d50] machine_kexec at ffffffff8a05518f
#1 [ffff9f583ea03da8] __crash_kexec at ffffffff8a1571d1
#2 [ffff9f583ea03e68] panic at ffffffff8a0ae0f2
#3 [ffff9f583ea03ef0] watchdog_timer_fn at ffffffff8a18abab
#4 [ffff9f583ea03f20] __hrtimer_run_queues at ffffffff8a136f58
#5 [ffff9f583ea03f80] hrtimer_interrupt at ffffffff8a137745
#6 [ffff9f583ea03fd8] smp_apic_timer_interrupt at ffffffff8aa025aa
#7 [ffff9f583ea03ff0] apic_timer_interrupt at ffffffff8aa01b0f
--- <IRQ stack> ---
#8 [ffffad38005e3e58] apic_timer_interrupt at ffffffff8aa01b0f
[exception RIP: kthread_should_stop+32]
RIP: ffffffff8a0d29d0 RSP: ffffad38005e3f08 RFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff9f583809a120 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000000
RBP: ffff9f5805728b80 R8: ffff9f5803ce80b0 R9: 0000000000000297
R10: ffffad38001dfcb8 R11: 000000000000b903 R12: ffffad380080bbc0
R13: ffff9f5803cec680 R14: 0000000000000000 R15: ffffffffc0975000
ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0018
#9 [ffffad38005e3f08] kthread_loop at ffffffffc097500a [cpu_usage_panic]
#10 [ffffad38005e3f10] kthread at ffffffff8a0d2773
#11 [ffffad38005e3f50] ret_from_fork at ffffffff8aa00215
其它细节不继续展开叙述了!
本文是一次内核崩溃测试的实例,本文提供的示例模块在使用 openEuler 系统的默认内核 panic 配置时并不会自动重启机器,而是一直 loop,这在客户环境是不可取的。一般需要开启在内核严重故障触发时自动重启,这样能够快速恢复用户业务,同时辅助 kdump 工具,仍旧能够在后期通过生成的 vmcore 文件来定位问题,不失为一种更好的解决方案。