• [清华大学]漏洞挖掘之状态敏感的模糊测试StateFuzz


    Dr.赵博栋 Prof.张超 清华大学 网络研究院 INSC
    本文主要介绍了通过State Fuzz对Linux驱动程序进行模糊测试,该Fuzz方法由赵博栋博士在InForSec会议上分享,并在USENIX Security上发布.StateFuzz :System Call-Based State-Aware Linux Driver Fuzzing.该篇文章主要介绍了核心方法,为展示测试数据与实验展望.
    在这里插入图片描述

    前言:

    模糊测试是当前主流的漏洞挖掘方法,近年来发现了大量的未知漏洞,受到工业界和学术界的广泛关注。其中,以代码覆盖率为进化指标的灰盒测试方案得到大量研究,衍生出了大量优化改进方案。但是,代码覆盖率与漏洞之间存在gap,提高代码覆盖率不一定能够有效发现潜在的安全漏洞。提出了状态敏感的模糊测试方法StateFuzz (USENIX’22),引入了程序状态作为进化指标,实验结果表明了该方法的有效性,在Linux和Android驱动中发现了数十个未知漏洞。本次报告将与大家探讨这一方案。

    背景

    漏洞:网络空间重要安全威胁

    重要事件:乌克兰断电事件 震网病毒事件 WannaCry HeartBleed(websites) Aurora(Google)

    漏洞:网络攻击的突破口

    制导部分(漏洞) 战斗部分(漏洞利用) 控制部分(恶意代码)

    美国军火商Lockheed-Martin提出的"杀伤链"

    • Reconnaissance 目标侦查 (漏洞挖掘
      Research,identification,and selection of targets

    • Weaponization 武器定制 (漏洞利用
      Pairing remote access malware with exploit into a
      deliverable payload(e.g.Adobe PDF and Microsoft Office files)

    • Delivery 武器投放(主动/被动)
      Transmission of weapon to target(e.g. via email attachments websites,r USB drivers)

    • Exploitation 武器生效(漏洞触发与劫持)

      Once delivered,the weapon’s code is triggered,exploiting vulnerable applications or systems.

    • Installation 持久驻留(恶意代码

      The weapon installs a backdoor on a target’s system allowing persistent access.

    • Command & Control 远程控制(僵尸网络)

      Outside server communicates with the weapons providing "hands on keyboard access"inside the

      target’s network.

    • Actions on Objective 最终行动(窃密/破坏/跳板)

      The attacker works to achieve the objective of the intrusion,which can include exfiltration or destruction of data,or intrusion of another target.

    漏洞与漏洞利用

    漏洞挖掘与漏洞利用生成本质上都是输入空间搜素问题
    输入样本空间 -> 漏洞Poc样本空间 -> 目标(软件、硬件、网络)
    漏洞示例CVE-2009-4270

    int outprintf(const char *fmt,...)
    {
      int count;char buf[1024];va_list args;
      va_start(args,fmt);
      count = vsprintf(buf,fmt,args);
      outwrite(buf,count);//print out
    }
    int main(int argc,char* argv[])
    {
      const char *arg;
      while((arg = *argv++)!=0){
        switch(arg[0]){
          case '-':{
            switch(arg[1]){
              case 0;
              default:
                  outprintf("unknown switch %s\n",arg[1]);
                }
              }
              default:...
            }
            ...
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    count = vsprintf(buf,fmt,args);没有对内存拷贝长度进行限制,造成了栈溢出问题
    Vul trigger conditions:

    • Path constraints 路径约束
    • Vul constraints 漏洞约束
      Discover vulnerabilities:
    • Symbolic execution 符号执行
    • Fuzzing(testing) 模糊测试
      基于代码覆盖率的Fuzzing更擅长解决Path constraints,

    漏洞挖掘技术概览

    漏洞挖掘技术发展历史

    第一阶段(1960s-1970s):人工审核(依赖经验、无法扩展) -> 源代码审计、逆向工程、经验规则
    第二阶段(1970s-1990s):规则扫描(误报高/可扩展性差) -> 静态分析、符号执行、模型检验
    第三阶段(1990s-2013s):动态测试(漏报高、覆盖率低) -> 随机畸形测试例,模拟攻击者攻击输入
    第四阶段(2013s-2023s):智能挖掘(智能进化) -> 知识与数据驱动,遗传进化算法

    第二代方案:规则扫描 静态分析(SAST)

    • 基于经验规则静态扫描

    优点:速度快
    缺点:误报高、无法输出poc验证脚本
    瓶颈:不可判定(rice定理)

    第三代方案:动态测试 DAST、IAST

    • 基于动态信息的漏洞挖掘

    优点:误报低
    缺点:覆盖率低,漏报高
    工业化产品:OWASP BURPSUITE VERACODE

    第三代方案:模糊测试(fuzzing)

    • Fuzzing 模糊测试

    生成/变异测试例,测试,检查,重复…
    Generator/Mutator -> inputs -> monitor(target program) -> Security violation? -> bugs

    • 科学问题/挑战:

    在无穷的输入空间中,如何高校搜素有限的漏洞样本?

    Fuzzing 1:Generation-based

    基于模块生成测试用例(e.g. grammar,specification)

    优点: valid inputs,more code coverage
    缺点: hard to setup,requires input knowledge(human efforts)
    工业界应用:peach bstorm

    Fuzzing 2:Mutation-based

    变异旧测试用例来生成新的测试用例

    优点:easy to setup,no prior knowledge required
    缺点:invalid inputs,limited code coverage(checksum,magic number etc.)
    工业界应用:Google OSS-Fuzz Micorsoft Project OneFuzz

    第四代方案:智能模糊测试

    目前学术界的探索方向:
    广度:支持不同类型的目标软件
    模糊测试系统应用到目标软件里面。
    深度:提升种子生成、变异、测试效率
    主要在种子变异和种子挑选环节进行方法优化。

    提供较好的初始种子测试例 -> 种子池挑选种子 -> 种子变异 ->能量分配(变异次数) -> 新测试例 -> 测试执行(覆盖率跟踪/安全监控)
    主要思想是优胜劣汰的方法,覆盖率跟踪使用遗传算法实现,得到的测试例覆盖率如果得到提升(进化),将会被筛选出作为种子放入种子池中。

    VUL337 课题组漏洞挖掘研究成果

    广度探索:

    • 固件/硬件 IOT 芯片 Bios/TEE
    • 内核/驱动 Windows MacOS Linux
    • 系统软件 浏览器 hypervisor SGX/TEE应用
    • 用户态软件 代码库 二进制程序 GUI程序
    • 区块链 符号执行 智能合约 DeFi
    • 网络设备/协议 网络服务 5G、路由器 网联车
      深度探索:
    • 种子生成 > 自动/智能 输入格式识别
    • 种子排序挑选 > 自动/智能程序语义理解
    • 种子变异
    • 测试性能优化 > 并行化、硬件协同
    • 进化信号跟踪 > 精确、轻量化
    • 进化策略 > 代码覆盖率、状态制导
    • 安全违例检测 > 定向挖掘、瓶颈爆破

    状态敏感模糊测试USENIX 2022

    Code Coverage - Limitation

    • Example:maze game

    most code can be explored easily
    no guidance to trigger the bug
    State:values of maze[y][x]

    while(true){
      ox=x; oy=y;
    
      switch(input[i]) {
        case: 'W': y--;break;
        case: 'S': y+=;break;
        case: 'A': x--;break;
        case: 'D': x+=;break;
          }
      if (maze[y][x]=='#'){Bug();}
      //If target is blocked,do not advance.
      if (maze[y][x] != ' '){x = ox; y =oy;}
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • Another Example:DNN testing

    most (Python) code can be explored easily
    State:output of neurons(activated or not)

    StateFuzz:State-aware Fuzzing

    • Intuition:guide fuzzers to explore more program states
      我们通过引导模糊测试,去探索更多的程序状态(Program State)。
      Program State:A combination of Register values and Memory values.
      所有寄存器的值和内存的值的组合。
      问题:如何去跟踪这样庞大的组合?
    • Intuition: guide fuzzers to explore more program states
    • Need to answer 3 quesions

    Q1: what are appropriate program states?如何定义一个确认的程序状态?
    Q2: how to recognize and track program states?如何识别与跟踪程序状态?
    Q3: how to guide fuzzers to explore program states?如何去引导模糊测试?

    Q1:What are program states?

    • Values of all memory and registers?

    the number of such states is overwhelmingly large
    hard to track in practice

    • Manual annotation:

    human efforts needed

    • Protocal status code:

    not always available

    • Using variables to represent states is very common
      使用变量来标识状态,我们也可以通过变量作为我们的程序状态。
    • Ideally,a state is a combination of all program variables(including memory and register values)

    state explodsion!

    • Practically,states will persist across interaction boundaries,which will be read by an interaction,and
      written another interaction.

    have a long life time
    can be updated(i.e… state transition)by users
    can affect the program’s control flow or memory access
    Ex:FTP Server Program
    User -> Pass Packet / User Packet -> FTP Server

    int ftpUSER(PFTPCONTEXT context,const char *params);
    int ftpPASS(PFTPCONTEXT context,const char *params);
    
    • 1
    • 2

    Ex:the variable context -> Access is shared by the Pass and List request

    int ftpLIST(PFTPCONTEXT context,const char *params){
      if (context->Access == FTP ACCESS_NOT_LOGGED_IN)
        return sendstring(context,error530);
    }
    
    • 1
    • 2
    • 3
    • 4
    int ftpPASS(PFCONTEXT context,const char *params){
      ...
      if (strcasecmp(temptext,"admin")==0){
        context->Access = FTP_ACCESS_FULL;
      }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    Q2:How to track states?

    • Step1:Recognize State Variables(varialbes shared by different user actions)

      step 1.1:recognize user actions 识别状态变量

      • interaces that could be accessed by users

      step 1.2:recognize variables accessed by actions

      • read/write variables

      step 1.3:intersection of actions’ variable

      • read by one action,and write by the other action
    • Example:the Maze game

      variables read by action ‘w’:LVMap[‘w’]={y}
      variables written by action ‘s’:SVMap[‘s’]={y}
      State variable set V=V U (LVMap[‘w’] 交 SVMap[‘s’])

    while(true){
      ox=x; oy=y;
    
      switch(input[i]) {
        case: 'W': y--;break;
        case: 'S': y+=;break;
        case: 'A': x--;break;
        case: 'D': x+=;break;
          }
      if (maze[y][x]=='#'){Bug();}
      //If target is blocked,do not advance.
      if (maze[y][x] != ' '){x = ox; y =oy;}
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • Step2:Calculate and track states(a combination of all state variables)

    how?

    Recall:How does AFL track code coverage?

    • coverage = combinations of code blocks
    • But number of combinations is too large.

    Instution:state coverage = combinations of state-variables’ values.

    Analyze the value ranges of each state-variable

    • e.g. (MIN,0],[0,4],(4,10],(10.MAX))
      跟踪变量的值域范围,而不是跟踪某一个值.
      We identify value ranges by solving constrains of condition statements.
      But the value set of each state-variables is too large,which causes edge explosion.

    通过判断变量是否影响相同的程序控制流,对变量进行组合.
    The combination of two relevant state-variables values.
    Both variables affect the same control-flow path or memory accessing.

    if (x<0)
      ...
    else if(x<=4)
      ...
    else 
      ...
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    Q3:How to explore program sates?

    遗传算法,使用代码覆盖率作为反馈Check Feedback.我们将状态变量的值域也作为遗传算法的指标.

    • Based on existing genetic algorithm

      which relies only on code coverage feedback currently

      • Our solution: 3-dimension feedback machanism
    • A test case is interesting,if it

    discovers new code
    discovers new value ranges of state variables
    discover new extremum values of state variables

    StateFuzz:Implementation

    1.Kernel Source code -> Program State Recognition(Static Analysis静态分析->State-variable List状态变量集合->Static Symbolic Execution静态符号执行->提取约束条件State-Variable Value Ranges)
    2.Instrumentation(State-variable Tracking Instrumentation &Code Coverage Instrumentation->Instrumented Kernel内核插桩)
    3.Fuzzing Loop(根据代码插桩情况选择如何保留种子Seed Preservation -> Seed Selection ->Mutation)

    具体实现细节

    • State Recognition

      DIFUZE(for program action recognition)
      CRIX(for building call graph)
      Clang Static Analyzer(for static symbolic execution)

    • Instrumentation

      LLVM Sancov
      SVF

    • Fuzzing loop

      Syzkaller

  • 相关阅读:
    【毕业设计】信用卡欺诈检测系统 - python 大数据
    Oracle-day5:新增、复制建表、表结构、表数据、删除
    ElasticSerach基础语法
    PyTorch 相关知识介绍
    C++ PrimerPlus 复习 第二章 进入c++
    pipeline agent分布式构建
    Session会话机制的应用(用户登录)
    网络安全常见问题隐患及其应对措施
    暴力破解Leetcode 42:接雨水问题
    c++初识之一
  • 原文地址:https://blog.csdn.net/qq_43332010/article/details/133818762