目录
文档最后有完整的oops输出文件,此处将输出分成多个小块进行分析。
- [ 2620.950912] oops_tryv1:try_oops_init():37: Lets Oops!
- Now attempting to write something to the NULL address 0x0000000000000000
- [ 2620.950919] BUG: kernel NULL pointer dereference, address: 0000000000000000
- [ 2620.950922] #PF: supervisor write access in kernel mode
- [ 2620.950923] #PF: error_code(0x0002) - not-present page
- [ 2620.950925] PGD 0 P4D 0
- [ 2620.950927] Oops: 0002 [#1] SMP PTI
MMU设置页错误编码为特定的编码,在x86平台下
[ 2620.950929] CPU: 7 PID: 4959 Comm: insmod Tainted: G OE 5.13.0-52-generic #59-Ubuntu
参考 https://www.kernel.org/doc/html/latest/admin-guide/taintedkernels.html.
- [ 2620.950931] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
- [ 2620.950932] RIP: 0010:try_oops_init+0x88/0x1000 [oops_tryv1]
格式如下:
function_name+off_from_func/size_of_func [module-name]
0x88:表示函数开始偏移量 (字节)
0x1000:是函数的大小
[ 2620.950937] Code: 29 4c 63 04 25 00 00 00 00 b9 32 00 00 00 48 c7 c2 30 c1 7e c0 48 c7 c6 67 c0 7e c0 48 c7 c7 72 c0 7e c0 e8 df 9f 85 f7 eb 0b 04 25 00 00 00 00 78 00 00 00 c9 31 c0 c3 00 00 00 00 00 00 00
这段代码可以用内核中的工具解析,在下文中有介绍 工具所在的位置为 scripts/decodecode 与 scripts/decode_stacktrace.sh
- [ 2620.950938] RSP: 0018:ffffad3c8578bc78 EFLAGS: 00010246
- [ 2620.950940] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
- [ 2620.950941] RDX: 0000000000000000 RSI: ffff912c741a0980 RDI: ffff912c741a0980
- [ 2620.950942] RBP: ffffad3c8578bc78 R08: 0000000000000000 R09: ffffad3c8578ba68
- [ 2620.950943] R10: ffffad3c8578ba60 R11: ffff912c7fec83e8 R12: ffffffffc0781000
- [ 2620.950943] R13: ffff912b450a6110 R14: 0000000000000000 R15: ffffffffc07ed000
- [ 2620.950945] FS: 00007f526a3a1b80(0000) GS:ffff912c74180000(0000) knlGS:0000000000000000
- [ 2620.950946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 2620.950947] CR2: 0000000000000000 CR3: 0000000111048003 CR4: 00000000003706e0
寄存器信息
控制寄存器
- [ 2620.950969] Call Trace:
- [ 2620.950971]
- [ 2620.950974] do_one_initcall+0x48/0x1d0
- [ 2620.950978] ? kmem_cache_alloc_trace+0xfb/0x240
- [ 2620.950984] do_init_module+0x52/0x270
- [ 2620.950987] load_module+0xa8f/0xb10
- [ 2620.950989] __do_sys_finit_module+0xc2/0x120
- [ 2620.950992] __x64_sys_finit_module+0x18/0x20
- [ 2620.950994] do_syscall_64+0x61/0xb0
- [ 2620.950997] ? fput+0x13/0x20
- [ 2620.950999] ? ksys_mmap_pgoff+0x135/0x260
- [ 2620.951000] ? exit_to_user_mode_prepare+0x37/0xb0
- [ 2620.951002] ? syscall_exit_to_user_mode+0x27/0x50
- [ 2620.951004] ? __x64_sys_mmap+0x33/0x40
- [ 2620.951005] ? do_syscall_64+0x6e/0xb0
调用栈,从下往上读,调用关系
- [ 2620.951007] entry_SYSCALL_64_after_hwframe+0x44/0xae
- [ 2620.951009] RIP: 0033:0x7f526a4c470d
- [ 2620.951012] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 66 0f 00 f7 d8 64 89 01 48
- [ 2620.951013] RSP: 002b:00007ffd87e48ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
- [ 2620.951014] RAX: ffffffffffffffda RBX: 0000558a1da0b7c0 RCX: 00007f526a4c470d
- [ 2620.951015] RDX: 0000000000000000 RSI: 0000558a1c698c02 RDI: 0000000000000003
- [ 2620.951016] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f526a5bbc60
- [ 2620.951017] R10: 0000000000000003 R11: 0000000000000246 R12: 0000558a1c698c02
- [ 2620.951018] R13: 0000558a1da0b760 R14: 00007ffd87e49148 R15: 0000558a1da0b8d0
- [ 2620.951019]
- [ 2620.951020] Modules linked in: oops_tryv1(OE+) kcsan_datarace(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common snd_ens1371 snd_ac97_codec gameport ac97_bus snd_pcm crct10dif_pclmul ghash_clmulni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel snd_seq crypto_simd cryptd snd_seq_device snd_timer snd rapl vmw_balloon soundcore joydev input_leds serio_raw vmw_vmci mac_hid sch_fq_codel vmwgfx ttm drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt msr parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic mptspi usbhid mptscsih mptbase ahci crc32_pclmul psmouse hid e1000 libahci scsi_transport_spi pata_acpi i2c_piix4
- [ 2620.951056] CR2: 0000000000000000
- [ 2620.951058] ---[ end trace eabb70a32207bb48 ]---
- [ 2620.951059] RIP: 0010:try_oops_init+0x88/0x1000 [oops_tryv1]
- [ 2620.951061] Code: 29 4c 63 04 25 00 00 00 00 b9 32 00 00 00 48 c7 c2 30 c1 7e c0 48 c7 c6 67 c0 7e c0 48 c7 c7 72 c0 7e c0 e8 df 9f 85 f7 eb 0b
04 25 00 00 00 00 78 00 00 00 c9 31 c0 c3 00 00 00 00 00 00 00 - [ 2620.951062] RSP: 0018:ffffad3c8578bc78 EFLAGS: 00010246
- [ 2620.951063] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
- [ 2620.951064] RDX: 0000000000000000 RSI: ffff912c741a0980 RDI: ffff912c741a0980
- [ 2620.951065] RBP: ffffad3c8578bc78 R08: 0000000000000000 R09: ffffad3c8578ba68
- [ 2620.951066] R10: ffffad3c8578ba60 R11: ffff912c7fec83e8 R12: ffffffffc0781000
- [ 2620.951067] R13: ffff912b450a6110 R14: 0000000000000000 R15: ffffffffc07ed000
- [ 2620.951068] FS: 00007f526a3a1b80(0000) GS:ffff912c74180000(0000) knlGS:0000000000000000
- [ 2620.951069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 2620.951070] CR2: 0000000000000000 CR3: 0000000111048003 CR4: 00000000003706e0
Modules linked:显示Oop时内核中加载的所有模块,内核模块是第三方的,当内核遇到错误时,会高度怀疑第三方模块导致的。此处oops_tryv1是出现问题的模块。OE标志在上文有介绍。
最后一行是总结 CR2的值,oop的原因:是(虚拟)地址导致
- //查看地址
- #grep oops_tryv1 /proc/modules
- oops_tryv1 24576 1 - Loading 0xffffffffc0223000 (OE+)
-
- //使用地址
- #objdump -dS --adjust-vma=0xffffffffc0223000 ./oops_tryv1.ko > oops_tryv1.disas
嵌入式环境下 ${CROSS_COMPILE}objdump -dS
查看oops_tryv1.disas的汇编文件
- static int __init try_oops_init(void)
- {
- ffffffffc071b000: e8 00 00 00 00 call ffffffffc071b005
0x5> - ...
-
- *(int *)val = 'x';
- ffffffffc071b088: c7 04 25 00 00 00 00 movl $0x78,0x0
- ffffffffc071b08f: 78 00 00 00
-
- }
- # gdb -q ./oops_tryv1.ko
- Reading symbols from ./oops_tryv1.ko...
- (gdb) list *try_oops_init+0x88
- 0x12f is in try_oops_init (/home/wy/misc/kernel/Linux-Kernel-Debugging/ch7/oops_tryv1/oops_tryv1.c:60).
- 55 * https://www.kernel.org/doc/Documentation/printk-formats.txt
- 56 */
- 57 } else // try writing to NULL
- 58 *(int *)val = 'x';
- 59
- 60 return 0; /* success */
- 61 }
- 62
- 63 static void __exit try_oops_exit(void)
- 64 {
将地址转化为相应的行号,可用在模块与系统启动出现的错误
使用的语法
addr2line -e vmlinux -p -f
-e Set the input file name (default is a.out)
-p Make the output easier to read for humans
-f Show function names
- # addr2line -e ./oops_tryv1.o -p -f 0x88
- try_oops_init at <>/oops_tryv1/oops_tryv1.c:58
-
- //在58行的位置查看源码
- # vim oops_tryv1.c
- ...
- 57 } else // try writing to NULL
- 58 *(int *)val = 'x';
- 59
- 60 return 0;
- ...
内核帮助脚本
- $ objdump -d <...>/linux-5.10.60/vmlinux | <...>/linux-5.10.60/
- scripts/checkstack.pl
-
-
- $ /linux-5.10.60/scripts/decode_stacktrace.sh
- Usage:
- <...>/linux-5.10.60/scripts/decode_stacktrace.sh -r
| [base path] [modules path]
- # /kernel/linux-5.15/scripts/decodecode < dmesg_oops_buginworkq.txt
- [ 380.853996] Code: 29 4c 63 04 25 00 00 00 00 b9 32 00 00 00 48 c7 c2 78 41 22 c0 48 c7 c6 67 40 22 c0 48 c7 c7 72 40 22 c0 e8 28 90 fd cd eb 0b
04 25 00 00 00 00 78 00 00 00 48 8d 65 f0 31 c0 5b 41 5c 5d c3 - All code
- ========
- 0: 29 4c 63 04 sub %ecx,0x4(%rbx,%riz,2)
- 4: 25 00 00 00 00 and $0x0,%eax
- 9: b9 32 00 00 00 mov $0x32,%ecx
- e: 48 c7 c2 78 41 22 c0 mov $0xffffffffc0224178,%rdx
- 15: 48 c7 c6 67 40 22 c0 mov $0xffffffffc0224067,%rsi
- 1c: 48 c7 c7 72 40 22 c0 mov $0xffffffffc0224072,%rdi
- 23: e8 28 90 fd cd call 0xffffffffcdfd9050
- 28: eb 0b jmp 0x35
- 2a:* c7 04 25 00 00 00 00 movl $0x78,0x0 <-- trapping instruction
- 31: 78 00 00 00
- 35: 48 8d 65 f0 lea -0x10(%rbp),%rsp
- 39: 31 c0 xor %eax,%eax
- 3b: 5b pop %rbx
- 3c: 41 5c pop %r12
- 3e: 5d pop %rbp
- 3f: c3 ret
-
- Code starting with the faulting instruction
- ===========================================
- 0: c7 04 25 00 00 00 00 movl $0x78,0x0
- 7: 78 00 00 00
- b: 48 8d 65 f0 lea -0x10(%rbp),%rsp
- f: 31 c0 xor %eax,%eax
- 11: 5b pop %rbx
- 12: 41 5c pop %r12
- 14: 5d pop %rbp
- 15: c3 ret
汇编代码精确显示了陷阱所在的位置
* c7 04 25 00 00 00 00 movl $0x78,0x0 <-- trapping instruction
usage: faddr2line [--list]
先查看内核配置有没有开启配置文件
- #grep CONFIG_RANDOMIZE_BASE /boot/config-5.15.0-47-generic
-
- CONFIG_RANDOMIZE_BASE=y
- # <>/scripts/faddr2line ./oops_tryv1.ko try_oops_init+0x88
- try_oops_init+0x88/0x100:
- try_oops_init at <...>/oops_tryv1/oops_tryv1.c:58
定位到了58行
如果发现
- # ./oops_tryv1.ko try_oops_init+0xdb
- bad symbol size: base: 0x0000000000000000 end: 0x0000000000000000
请参考:[PATCH] scripts/faddr2line: Fix overlapping text section failures - Josh Poimboeuf
需要更新faddr2line 5.19的内核已经解决这个问题
全部的oops输出文件
- 2620.950912] oops_tryv1:try_oops_init():37: Lets Oops!
- Now attempting to write something to the NULL address 0x0000000000000000
- [ 2620.950919] BUG: kernel NULL pointer dereference, address: 0000000000000000
- [ 2620.950922] #PF: supervisor write access in kernel mode
- [ 2620.950923] #PF: error_code(0x0002) - not-present page
- [ 2620.950925] PGD 0 P4D 0
- [ 2620.950927] Oops: 0002 [#1] SMP PTI
- [ 2620.950929] CPU: 7 PID: 4959 Comm: insmod Tainted: G OE 5.13.0-52-generic #59-Ubuntu
- [ 2620.950931] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
- [ 2620.950932] RIP: 0010:try_oops_init+0x88/0x1000 [oops_tryv1]
- [ 2620.950937] Code: 29 4c 63 04 25 00 00 00 00 b9 32 00 00 00 48 c7 c2 30 c1 7e c0 48 c7 c6 67 c0 7e c0 48 c7 c7 72 c0 7e c0 e8 df 9f 85 f7 eb 0b
04 25 00 00 00 00 78 00 00 00 c9 31 c0 c3 00 00 00 00 00 00 00 - [ 2620.950938] RSP: 0018:ffffad3c8578bc78 EFLAGS: 00010246
- [ 2620.950940] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
- [ 2620.950941] RDX: 0000000000000000 RSI: ffff912c741a0980 RDI: ffff912c741a0980
- [ 2620.950942] RBP: ffffad3c8578bc78 R08: 0000000000000000 R09: ffffad3c8578ba68
- [ 2620.950943] R10: ffffad3c8578ba60 R11: ffff912c7fec83e8 R12: ffffffffc0781000
- [ 2620.950943] R13: ffff912b450a6110 R14: 0000000000000000 R15: ffffffffc07ed000
- [ 2620.950945] FS: 00007f526a3a1b80(0000) GS:ffff912c74180000(0000) knlGS:0000000000000000
- [ 2620.950946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 2620.950947] CR2: 0000000000000000 CR3: 0000000111048003 CR4: 00000000003706e0
- [ 2620.950969] Call Trace:
- [ 2620.950971]
- [ 2620.950974] do_one_initcall+0x48/0x1d0
- [ 2620.950978] ? kmem_cache_alloc_trace+0xfb/0x240
- [ 2620.950984] do_init_module+0x52/0x270
- [ 2620.950987] load_module+0xa8f/0xb10
- [ 2620.950989] __do_sys_finit_module+0xc2/0x120
- [ 2620.950992] __x64_sys_finit_module+0x18/0x20
- [ 2620.950994] do_syscall_64+0x61/0xb0
- [ 2620.950997] ? fput+0x13/0x20
- [ 2620.950999] ? ksys_mmap_pgoff+0x135/0x260
- [ 2620.951000] ? exit_to_user_mode_prepare+0x37/0xb0
- [ 2620.951002] ? syscall_exit_to_user_mode+0x27/0x50
- [ 2620.951004] ? __x64_sys_mmap+0x33/0x40
- [ 2620.951005] ? do_syscall_64+0x6e/0xb0
- [ 2620.951007] entry_SYSCALL_64_after_hwframe+0x44/0xae
- [ 2620.951009] RIP: 0033:0x7f526a4c470d
- [ 2620.951012] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 66 0f 00 f7 d8 64 89 01 48
- [ 2620.951013] RSP: 002b:00007ffd87e48ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
- [ 2620.951014] RAX: ffffffffffffffda RBX: 0000558a1da0b7c0 RCX: 00007f526a4c470d
- [ 2620.951015] RDX: 0000000000000000 RSI: 0000558a1c698c02 RDI: 0000000000000003
- [ 2620.951016] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f526a5bbc60
- [ 2620.951017] R10: 0000000000000003 R11: 0000000000000246 R12: 0000558a1c698c02
- [ 2620.951018] R13: 0000558a1da0b760 R14: 00007ffd87e49148 R15: 0000558a1da0b8d0
- [ 2620.951019]
- [ 2620.951020] Modules linked in: oops_tryv1(OE+) kcsan_datarace(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common snd_ens1371 snd_ac97_codec gameport ac97_bus snd_pcm crct10dif_pclmul ghash_clmulni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel snd_seq crypto_simd cryptd snd_seq_device snd_timer snd rapl vmw_balloon soundcore joydev input_leds serio_raw vmw_vmci mac_hid sch_fq_codel vmwgfx ttm drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt msr parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic mptspi usbhid mptscsih mptbase ahci crc32_pclmul psmouse hid e1000 libahci scsi_transport_spi pata_acpi i2c_piix4
- [ 2620.951056] CR2: 0000000000000000
- [ 2620.951058] ---[ end trace eabb70a32207bb48 ]---
- [ 2620.951059] RIP: 0010:try_oops_init+0x88/0x1000 [oops_tryv1]
- [ 2620.951061] Code: 29 4c 63 04 25 00 00 00 00 b9 32 00 00 00 48 c7 c2 30 c1 7e c0 48 c7 c6 67 c0 7e c0 48 c7 c7 72 c0 7e c0 e8 df 9f 85 f7 eb 0b
04 25 00 00 00 00 78 00 00 00 c9 31 c0 c3 00 00 00 00 00 00 00 - [ 2620.951062] RSP: 0018:ffffad3c8578bc78 EFLAGS: 00010246
- [ 2620.951063] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
- [ 2620.951064] RDX: 0000000000000000 RSI: ffff912c741a0980 RDI: ffff912c741a0980
- [ 2620.951065] RBP: ffffad3c8578bc78 R08: 0000000000000000 R09: ffffad3c8578ba68
- [ 2620.951066] R10: ffffad3c8578ba60 R11: ffff912c7fec83e8 R12: ffffffffc0781000
- [ 2620.951067] R13: ffff912b450a6110 R14: 0000000000000000 R15: ffffffffc07ed000
- [ 2620.951068] FS: 00007f526a3a1b80(0000) GS:ffff912c74180000(0000) knlGS:0000000000000000
- [ 2620.951069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 2620.951070] CR2: 0000000000000000 CR3: 0000000111048003 CR4: 00000000003706e0