• eBPF-4-perf_map的丢失事件lost_event解读


    参考原文:

    • http://blog.itaysk.com/2020/04/20/ebpf-lost-events

    本文内容基本翻译于上面的链接

    一、什么是eBPF lost events

    我们都知道bpf运行在内核态,bpc也提供了多种内核态与用户态交互的方式,例如:

    不过官方还是推荐第二种方式,主要原因在于第一种的限制较多:3 args max, 1 %s only, and trace_pipe is globally shared

    所以我们需要先了解一下perf这个先于bpf诞生的分析工具提供的ring buffer机制(bpf也是利用了此机制)

    Ring buffer:

    ring buffer是一个连续的内存区域,生产者和消费者能够同时在其中读写,ring的原因在于即使环形缓冲区满了,写入方还是可以继续从首部写入即使覆盖了原本的数据

    由此也不难知晓lost event的意义:当消费者来不及消费旧事件时,新的事件已经被写入者追赶上写入而导致事件未被消费就被覆盖或者说lost event

    二、tracing the code

    追踪bcc中的perf ring buffer的整个过程来更好的理解

    如果使用python前端来写bcc,那么基本两个步骤就是:

    1. Initialize buffer using the BPF.open_perf_buffer method
    2. Start receiving events using BPF.perf_buffer_poll method

    Section 1 - BPF.open_perf_buffer

    1. The user program is opening a perf buffer using the BPF.open_perf_buffermethod which receives a lost_cb callback (function pointer): (source)

      使用open_perf_buffer函数会接受一个lost_cb的回调函数

      def open_perf_buffer(self, callback, page_cnt=8, lost_cb=None)
      
      • 1
    2. BPF.open_perf_buffer ends up creating a “reader” using the perf_reader_new C function. “reader” is a bcc construct that facilitates reading from a buffer. Appendix 1 walks through this code path.

      BPF.open_perf_buffer会创建一个reader用于读取ring buffer

    3. perf_reader_new C function is saving the callback in the newly created reader: (source)

      perf_reader_new函数会在reader中保存回调函数

      reader->lost_cb = lost_cb;
      
      • 1

    所以在一开始我们就需要将回调函数传递,随后会保存在创建的reader中(用于读取ring buffer)

    Section 2 - BPF.perf_buffer_poll

    1. The user program is calling BPF.perf_buffer_poll method to start receiving events. This is using bcc’s C function perf_reader_poll to read from the previously created “reader”: (source)

      使用BPF.perf_buffer_poll从reader中读取数据

      lib.perf_reader_poll(len(readers), readers, timeout)
      
      • 1
    2. perf_reader_poll is invoking the read function on every reader: (source) 在每个reader上调用read

    perf_reader_event_read(readers[i]);
    
    • 1
    1. perf_reader_event_read is reading an event. If it’s type is PERF_RECORD_LOST , it will call our lost events callback: (source)

      如果读出的事件类型是PERF_RECORD_LOST,那么就会调用我们的回调函数

      if (e->type == PERF_RECORD_LOST) { 
        ... 
        reader->lost_cb(reader->cb_cookie, lost); 
        ... 
      }
      
      • 1
      • 2
      • 3
      • 4
      • 5

    所以,当从ring bufffer中读出的事件类型是PERF_RECORD_LOST就会调用我们预先设置的回调函数,但是我们没有submit这个类型的事件,它是怎么发生的呢?

    Section 3 - perf_submit

    To submit events from our eBPF © program, we are instructed to initialize a “table” using the BPF_PERF_OUTPUT macro, and then call the perf_submit bcc C helper function.

    为了模拟perf_submit,我们创建了table

    1. The user eBPF program is using BPF_PERF_OUTPUT to define a struct. The created struct holds a pointer to the perf_submit function: (source:)

    2. The user eBPF program calls table.perf_submit() to submit an event

    3. bpf_perf_event_output ends up calling perf_event_output function from the perf subsystem. Appendix 2 walks through this code path.

      table.perf_submit最终会调用perf子系统的perf_event_output函数,向ring buffer写入

    4. perf_event_output calls perf_output_begin function before it actually submits an event. (source)

    5. perf_output_begin kernel function is the one that creates the “lost events”: (source)

      struct { struct perf_event_header header; u64 id; u64 lost; } lost_event;
      
      • 1

    perf_output_begin最终在实际提交event之前会检查和创建lost event :

    if (unlikely(have_lost)) { … lost_event.header.type = PERF_RECORD_LOST;}
    
    • 1

    What is this have_lost indicator? Let’s dig (final stretch, bear with me)

    关键的判断指标hava_lost是如何工作的呢?

    Section 4 - Tracking the ring buffer’s have_lost indicator

    If we look at the perf_output_begin function from the kernel’s perf ring buffer implementation:

    1. have_lost variable is holding the ring buffer’s lost field: (source) have_lost 变量保存了环形缓冲区的 lost 字段:

      have_lost = local_read(&rb->lost);
      
      • 1
    2. There’s a check if there’s enough space in the ring buffer (and also the buffer is configured to not overwrite), then we go to fail: (source)

      if (!rb->overwrite && unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
        goto fail;
      
      • 1
      • 2

      会检查如果ring buffer已经满了或者被配置超越某个大小,那么就直接失败

    3. Under fail, rb->lost is being incremented: (source) 在fail的实现中就是标记lost发生了,+1

      fail: 
        local_inc(&rb->lost);
      
      • 1
      • 2

    不难想象,发生lost event的原因就是写入时已满/达到最大容量修改flag,然后读取方进行了判断,如果是lost event类型,那么就会执行对应的callback函数

    总体流程如下:

    lost events depiction

    三、Lost events in gobpf

    如何在gobpf中使用lost event?作者已经在提交了issue:https://github.com/iovisor/gobpf/pull/235. (已合并)

    重点的改变在于:

    1. change callbackData struct to contain a lost channel in addition to the main channel:

      callbackData中除了主通道之外创建lostChan

      callbackDataIndex := registerCallback(&callbackData{
           receiverChan,
           lostChan,
       })
      
      • 1
      • 2
      • 3
      • 4
    2. change the signature of the InitPerfMap user facing function making it also accept a channel for lost events:

      func InitPerfMap(table *Table, receiverChan chan []byte, lostChan chan uint64) (*PerfMap, error) {...}
      
      • 1
    3. in the call to the lower level bcc C function bpf_open_perf_buffer, pass the registered lost callback:

      reader, err := C.bpf_open_perf_buffer(
           (C.perf_reader_raw_cb)(unsafe.Pointer(C.rawCallback)),
           (C.perf_reader_lost_cb)(unsafe.Pointer(C.lostCallback)),
           unsafe.Pointer(uintptr(callbackDataIndex)),
           -1, cpuC, pageCntC)
      
      • 1
      • 2
      • 3
      • 4
      • 5

    四、Appendix

    拓展阅读:https://www.kernel.org/doc/Documentation/circular-buffers.txt

    Appendix 1 - from BPF.open_perf_buffer to perf_reader_new

    1. BPF.open_perf_buffer method is calling into bcc’s C function lib.bpf_open_perf_buffer (source)
    2. bpf_open_perf_buffer function is creating a reader using perf_reader_new function (source)

    Appendix 2 - from bpf_perf_event_output to perf_event_output

    1. table.perf_submit() function is converted to bpf_perf_event_output() (source)
    2. bpf_perf_event_output is implemented by BPF_FUNC_perf_event_output (source)
    3. BPF_FUNC_perf_event_output is an eBPF helper: (source)
    4. BPF_FUNC_perf_event_output is creating the bpf_perf_event_output prototype: bpf_perf_event_output_proto: (source)
    5. bpf_perf_event_output_proto is pointing to the bpf_perf_event_output function (source)
    6. bpf_perf_event_output function is calling the perf_event_output function [(source)](
  • 相关阅读:
    go test传参问题
    防火墙实验1
    基于改进二进制粒子群算法的含需求响应机组组合问题研究(matlab代码)
    华为OD机考算法题:矩阵最大值
    天软特色因子看板 (2023.09 第04期)
    gdb调试常用概念整理
    SpringCache
    其他算法和思想的题目
    遇到vue 前端npm i 报错多个版本不一致 可以强行解决
    回溯算法的基本思想
  • 原文地址:https://blog.csdn.net/weixin_43988498/article/details/125993233