平台:linux/armv7
以编程中经常发生程序段错误为例,要全方面理解段错误发生时候硬件和操作系统背后的逻辑,我们尝试思考如下几个核心问题:
对于arm linux内核来讲,应用程序的段错误核心是一种“异常”,ARM平台定义了如下几种异常,异常向量表中定义每种异常的处理程序,如下:

段错误发生的时候最终会调用到异常向量表中data abort对应的处理程序。
arm异常发生时硬件的动作在armv7 B1.8.3 章节有详细的描述,总结如下:
细节:CPU怎么跳转到异常处理程序?
核心cpu需要知道异常向量表的地址,这个不同的芯片平台可以有不同配置,针对arm linux一般配置在0xFFFF0000地址处,这个可以参考armv7芯片手册的B1.8.1,CP15协处理器的SCTLR寄存器可以控制具体的地址。
上面"段错误发生时的硬件逻辑"第5步开始跳转到OS软件的逻辑,对于os有了解的都知道,异常发生时最重要的步骤是保存硬件上下文。这点arm和x86逻辑并不相同,x86硬件做了更多的逻辑,arm可能基于功耗的考虑,没有设计那么复杂的硬件逻辑,保存上下文完全通过软件实现。硬件上下文(即各种寄存器)保存在内核栈中(arm芯片有要求不同mode使用不同的stack,linux也遵循了这个逻辑)。
5.1 中断向量表
- arch/arm/kernel/entry-armv.S
-
- .L__vectors_start:
- W(b) vector_rst
- W(b) vector_und
- W(ldr) pc, .L__vectors_start + 0x1000
- W(b) vector_pabt
- W(b) vector_dabt
- W(b) vector_addrexcptn
- W(b) vector_irq
- W(b) vector_fiq
可以看到汇编代码中定义了armv7芯片手册中定义的各种异常处理函数,本文的段错误发生会跳转到vector_dabt函数,这个函数定义是通过vector_stub宏定义:
- /*
- * Data abort dispatcher
- * Enter in ABT mode, spsr = USR CPSR, lr = USR PC
- */
- vector_stub dabt, ABT_MODE, 8
-
- .long __dabt_usr @ 0 (USR_26 / USR_32)
- .long __dabt_invalid @ 1 (FIQ_26 / FIQ_32)
- .long __dabt_invalid @ 2 (IRQ_26 / IRQ_32)
- .long __dabt_svc @ 3 (SVC_26 / SVC_32)
- .long __dabt_invalid @ 4
- .long __dabt_invalid @ 5
- .long __dabt_invalid @ 6
- .long __dabt_invalid @ 7
- .long __dabt_invalid @ 8
- .long __dabt_invalid @ 9
- .long __dabt_invalid @ a
- .long __dabt_invalid @ b
- .long __dabt_invalid @ c
- .long __dabt_invalid @ d
- .long __dabt_invalid @ e
- .long __dabt_invalid @ f
vector_stub宏定义了vector_dabt函数,函数体后面定义了16个入口,有效的入口是0和3索引处的__dabt_usr和__dabt_svc,用户空间发生dabt异常跳转到__dabt_usr函数,内核空间发生dabt异常跳转到__dabt_svc,如何实现这种跳转要需要分析vector_stub宏实现代码:
- /*
- * Vector stubs.
- *
- * This code is copied to 0xffff1000 so we can use branches in the
- * vectors, rather than ldr's. Note that this code must not exceed
- * a page size.
- *
- * Common stub entry macro:
- * Enter in IRQ mode, spsr = SVC/USR CPSR, lr = SVC/USR PC
- *
- * SP points to a minimal amount of processor-private memory, the address
- * of which is copied into r0 for the mode specific abort handler.
- */
- .macro vector_stub, name, mode, correction=0
- .align 5
-
- vector_\name:
- .if \correction
- sub lr, lr, #\correction
- .endif
-
- @
- @ Save r0, lr_<exception> (parent PC) and spsr_<exception>
- @ (parent CPSR)
- @
- stmia sp, {r0, lr} @ save r0, lr
- mrs lr, spsr
- str lr, [sp, #8] @ save spsr
-
- @
- @ Prepare for SVC32 mode. IRQs remain disabled.
- @
- mrs r0, cpsr
- eor r0, r0, #(\mode ^ SVC_MODE | PSR_ISETSTATE)
- msr spsr_cxsf, r0
-
- @
- @ the branch table must immediately follow this code
- @
-
- //获取异常之前处于哪个模式mode. 0:user 3: svc
- and lr, lr, #0x0f
- THUMB( adr r0, 1f )
- THUMB( ldr lr, [r0, lr, lsl #2] )
-
- //内核栈sp赋值给r0
- mov r0, sp
-
- //lr = pc + lr * 4,根据mode值算出来要跳转的入口,比如用户空间跳转到__dabt_usr
- ARM( ldr lr, [pc, lr, lsl #2] )
- movs pc, lr @ branch to handler in SVC mode
- ENDPROC(vector_\name)
-
- .align 2
- @ handler addresses follow this label
- 1:
- .endm
5.2 __dabt_user
- __dabt_usr:
-
- //保存硬件上下文,即各种寄存器
- usr_entry uaccess=0
- kuser_cmpxchg_check
-
- sp赋值给r2,即给do_DataAbort第三个参数pt_regs传参
- mov r2, sp
-
- //内部实现跳转到CPU_DABORT_HANDLER,即v7_early_abort
- dabt_helper
- b ret_from_exception
- UNWIND(.fnend )
- ENDPROC(__dabt_usr)
5.3 v7_early_abort
- #include <linux/linkage.h>
- #include <asm/assembler.h>
- /*
- * Function: v7_early_abort
- *
- * Params : r2 = pt_regs
- * : r4 = aborted context pc
- * : r5 = aborted context psr
- *
- * Returns : r4 - r11, r13 preserved
- *
- * Purpose : obtain information about current aborted instruction.
- */
- .align 5
- ENTRY(v7_early_abort)
-
- //armv7架构中cp15写处理的c5和c6寄存器保存了fsr和far
- mrc p15, 0, r1, c5, c0, 0 @ get FSR
- mrc p15, 0, r0, c6, c0, 0 @ get FAR
- uaccess_disable ip @ disable userspace access
-
- b do_DataAbort
- ENDPROC(v7_early_abort)
FSR寄存器记录发生存储失效的相关信息,包括存储访问所属域和存储访问类型,FAR记录访存失效的虚拟地址。参数r0 r1 r2准备好之后跳转到do_DataAbort执行。
5.4 do_DataAbort
- /*
- * Dispatch a data abort to the relevant handler.
- */
- asmlinkage void __exception
- do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
- {
- struct thread_info *thread = current_thread_info();
- const struct fsr_info *inf = fsr_info + fsr_fs(fsr);
- struct siginfo info;
-
- if (!user_mode(regs)) {
- thread->cpu_excp++;
- if (thread->cpu_excp == 1) {
- thread->regs_on_excp = (void *)regs;
- aee_excp_regs = (void *)regs;
- }
- /*
- * NoteXXX: The data abort exception may happen twice
- * when calling probe_kernel_address() in which.
- * __copy_from_user_inatomic() is used and the
- * fixup table lookup may be performed.
- * Check if the nested panic happens via
- * (cpu_excp >= 3).
- */
- if (thread->cpu_excp >= 3)
- aee_stop_nested_panic(regs);
- }
-
- if (!inf->fn(addr, fsr & ~FSR_LNX_PF, regs)) {
- if (!user_mode(regs))
- thread->cpu_excp--;
- return;
- }
-
- pr_alert("Unhandled fault: %s (0x%03x) at 0x%08lx\n",
- inf->name, fsr, addr);
- show_pte(current->mm, addr);
-
- info.si_signo = inf->sig;
- info.si_errno = 0;
- info.si_code = inf->code;
- info.si_addr = (void __user *)addr;
- arm_notify_die("", regs, &info, fsr, 0);
- }
根据fsr寄存器信息得到fsr_info结构体,该结构体描述一个异常状态对应的处理逻辑,结构体定义如下:
- struct fsr_info {
- int (*fn)(unsigned long addr, unsigned int fsr, struct pt_regs *regs);
- int sig;
- int code;
- const char *name;
- };
name是失效状态的名称,sig是处理失效时Linux内核发送的信号,fn表示修复失效状态的处理函数。结构体的具体实现如下:
- static struct fsr_info fsr_info[] = {
- /*
- * The following are the standard ARMv3 and ARMv4 aborts. ARMv5
- * defines these to be "precise" aborts.
- */
- { do_bad, SIGSEGV, 0, "vector exception" },
- { do_bad, SIGBUS, BUS_ADRALN, "alignment exception" },
- { do_bad, SIGKILL, 0, "terminal exception" },
- { do_bad, SIGBUS, BUS_ADRALN, "alignment exception" },
- { do_bad, SIGBUS, 0, "external abort on linefetch" },
- { do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" },
- { do_bad, SIGBUS, 0, "external abort on linefetch" },
- { do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" },
- { do_bad, SIGBUS, 0, "external abort on non-linefetch" },
- { do_bad, SIGSEGV, SEGV_ACCERR, "section domain fault" },
- { do_bad, SIGBUS, 0, "external abort on non-linefetch" },
- { do_bad, SIGSEGV, SEGV_ACCERR, "page domain fault" },
- { do_bad, SIGBUS, 0, "external abort on translation" },
- { do_sect_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },
- { do_bad, SIGBUS, 0, "external abort on translation" },
- { do_page_fault, SIGSEGV, SEGV_ACCERR, "page permission fault" },
- ...
- /*
针对本文举例的段错误,对应的处理函数是do_translation_fault函数。该函数中最后会给进程发送段错误信号。
参考文章: