• Presto查询慢SQL原因排查


    线上有一个很简单的查询语句,近似如下:

    1. select time,name,add_u,active_u,avgtime,……
    2. from app.content_day where pid = '-1' and pid_sid = 'all' and os_type = 'all' and time between '2022-07-23' and '2022-07-23'
    3. order by add_user desc limit 50

    只查一张表,底层是HIVE表,但是耗时需要5秒左右。

    首先想到是explain看下执行过程:

    - Output[time,……] - TopN[50 by (add_user DESC_NULLS_LAST)] => [add_user:integer,……] - LocalExchange[SINGLE] () => [……] - RemoteStreamingExchange[GATHER] => [……] - TopNPartial[50 by (add_user DESC_NULLS_LAST)] => [……] - ScanFilterProject[table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=app, tableName=content_day, analyzePartitionValues=Optional.empty}', layout='Optional[app.content_day{domains={pid=[ [[-1]] ], pid_sid=[ [[all]] ], time=[ [[2022-07-23]] ], os_type=[ [[all]] ]}}]'}, filterPredicate = (((os_type) = (VARCHAR all)) AND ((pid) = (VARCHAR -1))) AND (((pid_sid) = (VARCHAR all)) AND ((time) = (VARCHAR 2022-07-23)))] => [……] Estimates: {rows: 148346 (19.59MB), cpu: 30036647.00, memory: 0.00, network: 0.00}/{rows: 85 (11.45kB), cpu: 60073294.00, memory: 0.00, network: 0.00}/{rows: 85 (11.45kB), cpu: 60085014.38, memory: 0.00, network: 0.00} LAYOUT: app.content_day{domains={……:int:8:REGULAR

     直观看到cpu这个数值特别大,rows14万行,似乎还可以,很难得出结论。

    还有另外一个分析执行计划命令,会更加详细。

    EXPLAIN ANALYZE + SQL

    Fragment 1 [SINGLE] CPU: 1.54ms, Scheduled: 1.77ms, Input: 50 rows (4.33kB); per task: avg.: 50.00 std.dev.: 0.00, Output: 50 rows (4.33kB) Output layout: [……] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - TopN[50 by (add_user DESC_NULLS_LAST)] => [……r] CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 50 rows (4.33kB) Input avg.: 50.00 rows, Input std.dev.: 0.00% - LocalExchange[SINGLE] () => [……] CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 50 rows (4.33kB) Input avg.: 3.13 rows, Input std.dev.: 387.30% - RemoteSource[2] => [……] CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 50 rows (4.33kB) Input avg.: 3.13 rows, Input std.dev.: 387.30%

    Fragment 2 [SOURCE] CPU: 4.68s, Scheduled: 5.09s, Input: 2973242 rows (105.38MB); per task: avg.: 1342041.50 std.dev.: 1342041.50, Output: 50 rows (4.33kB) Output layout: [……] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION - TopNPartial[50 by (add_user DESC_NULLS_LAST)] => [……] CPU: 9.00ms (0.19%), Scheduled: 9.00ms (0.16%), Output: 50 rows (4.33kB) Input avg.: 314.00 rows, Input std.dev.: 100.00% - ScanFilterProject[table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=app, tableName=map_day, analyzePartitionValues=Optional.empty}', layout='Optional[app...domains={pid=[ [[-1]] ], pid_sid=[ [[all]] ], time=[ [[2022-07-23]] ], os_type=[ [[all]] ]}}]'}, grouped = false, filterPredicate = (((os_type) = (VARCHAR all)) AND ((pid) = (VARCHAR -1))) AND (((pid_sid) = (VARCHAR all)) AND ((time) = (VARCHAR 2022-07-23)))] => [……] CPU: 4.67s (99.81%), Scheduled: 5.52s (99.84%), Output: 628 rows (53.55kB) Input avg.: 1483350 rows, Input std.dev.: 100.00% LAYOUT: ……REGULAR Input: 2684083 rows (107.24MB), Filtered: 99.98%

    基本可以坐实一件事儿,这个慢SQL需要扫描297万行数据。

    进入Presto的Dashboard,也能看到一些具体信息:

    SQL执行时出现明显的峰值

     点开查询执行的详细信息发现输入行数也是2.97百万。

    Resource Utilization Summary
    CPU Time    4.73s
    Scheduled Time    5.10s
    Blocked Time    1.45m
    Input Rows    2.97M
    Input Data    107.24MB
    Raw Input Rows    2.97M
    Raw Input Data    25.52MB
    Peak User Memory    0B
    Peak Total Memory    2.93MB
    Memory Pool    general
    Cumulative User Memory    0 seconds
    Output Rows    50.0
    Output Data    4.33kB
    Written Output Rows    0.00
    Written Output Logical Data Size    0B
    Written Output Physical Data Size    0B
    Timeline
    Parallelism    
    0.89
    Scheduled Time/s    
    0.96
    Input Rows/s    
    506K

    Input Bytes/s    
    20.2MB
    Memory Utilization    
    0B

    回到这个HIVE表本身,表中全量数据297万,问题在这些数据每日全量更新,没有按日分片,导致全表扫描。

  • 相关阅读:
    11 函数的极值和最大最小值
    组里来了一个实习生,一行代码引发了一个惨案
    香港服务器一定比美国服务器好吗?
    张家界四日研学旅行
    Leetcode171. Excel 表列序号
    手拉手一起学HTML(下)——表格标签和列表标签,表单标签
    机器学习(一)
    SpringBoot日志文件
    如何实现企业软件的“超级 App 化”?
    VSCode的C/C++开发 ===> Windows
  • 原文地址:https://blog.csdn.net/feng_zi0yhv/article/details/125999960