• Grafana 开源了一款 eBPF 采集器 Beyla


    eBPF 的发展如火如荼,在可观测性领域大放异彩,Grafana 近期也发布了一款 eBPF 采集器,可以采集服务的 RED 指标,本文做一个尝鲜介绍,让读者有个大概了解。

    eBPF 基础介绍可以参考我之前的文章《eBPF Hello world》。理论上,eBPF 可以拿到服务收到的请求信息,比如QPS、延迟、成功率等,这些数据对于应用级监控至关重要,Grafana Beyla 就是为此而生的。

    要测试使用 Beyla 采集服务的 RED(Rate-Errors-Duration) 指标,那首先得有个服务,这里我用的是 answer( https://answer.flashcat.cloud ) 论坛,你也可以自己搞一个简单的 http 服务,比如:

    1. package main
    2. import (
    3. "net/http"
    4. "strconv"
    5. "time"
    6. )
    7. func handleRequest(rw http.ResponseWriter, req *http.Request) {
    8. status := 200
    9. for k, v := range req.URL.Query() {
    10. if len(v) == 0 {
    11. continue
    12. }
    13. switch k {
    14. case "status":
    15. if s, err := strconv.Atoi(v[0]); err == nil {
    16. status = s
    17. }
    18. case "delay":
    19. if d, err := time.ParseDuration(v[0]); err == nil {
    20. time.Sleep(d)
    21. }
    22. }
    23. }
    24. rw.WriteHeader(status)
    25. }
    26. func main() {
    27. http.ListenAndServe(":8080",
    28. http.HandlerFunc(handleRequest))
    29. }

    上面这个代码,保存成 server.go,然后用 go run server.go 即可运行,当然,前提是你机器上有 go 开发环境。这个小服务,可以接收两个参数,一个是 status,用来指定返回的 http 状态码,另一个是 delay,用来指定延迟多久返回,比如:

    curl -v "http://localhost:8080/foo?status=404"
    

    上面的命令,会返回 404 状态码,如果想延迟 1 秒返回,可以这样:

    curl -v "http://localhost:8080/foo?delay=1s"
    

    接下来,我们就可以使用 Beyla 采集这个服务的 RED 指标了。

    下载 Beyla

    我的机器上有 go 开发环境,所以我直接使用 go install 安装了,你也可以去 Beyla 的 release 页面下载二进制包,然后解压缩使用。

    go install github.com/grafana/beyla/cmd/beyla@latest
    
    运行 Beyla

    使用下面的命令运行 Beyla:

    $ BEYLA_PROMETHEUS_PORT=8999 PRINT_TRACES=true OPEN_PORT=8080 sudo -E beyla
    

    或者直接使用 root 账号运行,比如我是这么跑的:

    $ BEYLA_PROMETHEUS_PORT=8999 PRINT_TRACES=true OPEN_PORT=8080 beyla
    

    解释一下这几个参数:

    • BEYLA_PROMETHEUS_PORT: Beyla 要监听的端口,通过这个端口暴露 metrics 指标数据
    • PRINT_TRACES: 是否打印 trace 日志
    • OPEN_PORT: Beyla 采集的目标服务监听的端口,这里是 8080,上面给出的那段 go server 的代码就是监听在 8080,我的机器上 answer 论坛程序也是监听在 8080,你要监控的程序如果不是监听在 8080,可以在换成你自己的端口
    查看指标

    运行之后,可以通过 curl 查看指标:

    curl http://localhost:8999/metrics
    

    返回的内容如下:

    1. # HELP http_client_duration_seconds duration of HTTP service calls from the client side, in seconds
    2. # TYPE http_client_duration_seconds histogram
    3. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0"} 0
    4. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.005"} 0
    5. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.01"} 0
    6. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.025"} 0
    7. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.05"} 0
    8. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.075"} 0
    9. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.1"} 0
    10. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.25"} 0
    11. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.5"} 0
    12. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.75"} 0
    13. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="1"} 0
    14. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="2.5"} 1
    15. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="5"} 1
    16. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="7.5"} 1
    17. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="10"} 1
    18. http_client_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="+Inf"} 1
    19. http_client_duration_seconds_sum{http_method="GET",http_status_code="200",service_name="answer"} 1.668771575
    20. http_client_duration_seconds_count{http_method="GET",http_status_code="200",service_name="answer"} 1
    21. # HELP http_client_request_size_bytes size, in bytes, of the HTTP request body as sent from the client side
    22. # TYPE http_client_request_size_bytes histogram
    23. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0"} 1
    24. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="32"} 1
    25. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="64"} 1
    26. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="128"} 1
    27. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="256"} 1
    28. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="512"} 1
    29. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="1024"} 1
    30. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="2048"} 1
    31. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="4096"} 1
    32. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="8192"} 1
    33. http_client_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="+Inf"} 1
    34. http_client_request_size_bytes_sum{http_method="GET",http_status_code="200",service_name="answer"} 0
    35. http_client_request_size_bytes_count{http_method="GET",http_status_code="200",service_name="answer"} 1
    36. # HELP http_server_duration_seconds duration of HTTP service calls from the server side, in seconds
    37. # TYPE http_server_duration_seconds histogram
    38. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0"} 0
    39. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.005"} 201
    40. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.01"} 789
    41. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.025"} 799
    42. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.05"} 799
    43. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.075"} 799
    44. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.1"} 799
    45. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.25"} 799
    46. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.5"} 799
    47. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0.75"} 799
    48. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="1"} 799
    49. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="2.5"} 800
    50. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="5"} 800
    51. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="7.5"} 800
    52. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="10"} 800
    53. http_server_duration_seconds_bucket{http_method="GET",http_status_code="200",service_name="answer",le="+Inf"} 800
    54. http_server_duration_seconds_sum{http_method="GET",http_status_code="200",service_name="answer"} 5.752096697000003
    55. http_server_duration_seconds_count{http_method="GET",http_status_code="200",service_name="answer"} 800
    56. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0"} 0
    57. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.005"} 1
    58. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.01"} 1
    59. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.025"} 1
    60. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.05"} 1
    61. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.075"} 1
    62. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.1"} 1
    63. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.25"} 1
    64. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.5"} 1
    65. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0.75"} 1
    66. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="1"} 1
    67. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="2.5"} 1
    68. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="5"} 1
    69. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="7.5"} 1
    70. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="10"} 1
    71. http_server_duration_seconds_bucket{http_method="GET",http_status_code="302",service_name="answer",le="+Inf"} 1
    72. http_server_duration_seconds_sum{http_method="GET",http_status_code="302",service_name="answer"} 0.001523002
    73. http_server_duration_seconds_count{http_method="GET",http_status_code="302",service_name="answer"} 1
    74. # HELP http_server_request_size_bytes size, in bytes, of the HTTP request body as received at the server side
    75. # TYPE http_server_request_size_bytes histogram
    76. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="0"} 800
    77. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="32"} 800
    78. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="64"} 800
    79. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="128"} 800
    80. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="256"} 800
    81. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="512"} 800
    82. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="1024"} 800
    83. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="2048"} 800
    84. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="4096"} 800
    85. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="8192"} 800
    86. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="200",service_name="answer",le="+Inf"} 800
    87. http_server_request_size_bytes_sum{http_method="GET",http_status_code="200",service_name="answer"} 0
    88. http_server_request_size_bytes_count{http_method="GET",http_status_code="200",service_name="answer"} 800
    89. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="0"} 1
    90. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="32"} 1
    91. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="64"} 1
    92. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="128"} 1
    93. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="256"} 1
    94. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="512"} 1
    95. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="1024"} 1
    96. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="2048"} 1
    97. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="4096"} 1
    98. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="8192"} 1
    99. http_server_request_size_bytes_bucket{http_method="GET",http_status_code="302",service_name="answer",le="+Inf"} 1
    100. http_server_request_size_bytes_sum{http_method="GET",http_status_code="302",service_name="answer"} 0
    101. http_server_request_size_bytes_count{http_method="GET",http_status_code="302",service_name="answer"} 1
    102. # HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
    103. # TYPE promhttp_metric_handler_errors_total counter
    104. promhttp_metric_handler_errors_total{cause="encoding"} 0
    105. promhttp_metric_handler_errors_total{cause="gathering"} 0

    这些指标就可以用采集器来抓了,比如 vmagent、categraf、prometheus 等,完事之后入库,使用 Grafana 展示分析即可,经常关注本公众号的读者对于这些知识应该比较熟悉了,这里不再赘述。Beyla 默认提供了一个 Grafana Dashboard,可以导入测试:https://github.com/grafana/beyla/tree/main/grafana

    结语

    Beyla 目前还不太稳定,还有很多功能没有完成。不过可以尝鲜研究了。可观测性整套技术栈搞起来还挺费劲的,如果您想建设这套技术栈,欢迎来和我们聊聊,我们提供这方面的咨询和商业产品,详情了解:

    快猫星云 Flashcat | 为了无法度量的价值 | 开源监控 | 夜莺监控 | 可观测平台 | 运维监控 | IT监控快猫星云(官网),支持云原生监控、混合云监控、多云统一监控,解决云原生架构、混合云架构下统一监控难、故障定位慢的问题icon-default.png?t=N7T8https://flashcat.cloud/

  • 相关阅读:
    MySQL MVCC机制探秘:数据一致性与并发处理的完美结合,助你成为数据库高手
    大数据_数据中台建设的成熟度评估模型
    亚马逊云科技最新分享:人、流程、工具全链路数据安全合规
    Idea+maven+spring-cloud项目搭建系列--8整合Zookeeper
    【Linux】进程等待
    Centos安装/更新Docker
    拾壹博客拆解改造,页面元素替换(二)
    【C++】函数重载
    altera FPGA 程序固化命令
    CVPR 2022 | SharpContour:一种基于轮廓变形 实现高效准确实例分割的边缘细化方法
  • 原文地址:https://blog.csdn.net/n9ecommunity/article/details/133358037