• explain 实战-----查看hive sql执行计划


    目录

    1.join/left join/full join 语句会过滤关联字段 null 的值吗?

    (1)join 

    (2) left join /full join

    2.group by 分组语句会进行排序吗?


    1.join/left join/full join 语句会过滤关联字段 null 的值吗?

    (1)join 

    sql:

    1. explain
    2. select
    3. a.into_id,
    4. a.into_bus_id,
    5. b.customer_type as customer_type
    6. from
    7. --进件表
    8. (select * from dp_ods.o_fk_eagle_jsd_intopieces_s
    9. where etl_date = '2023-09-06' and substr(into_time ,1,4) >= '2019' ) a
    10. join
    11. (select * from dp_ods.o_hyd_jsd_loan_order_s where etl_date = '2023-09-06') b
    12. on a.into_id=b.loan_id
    1. STAGE DEPENDENCIES:
    2. Stage-1 is a root stage
    3. Stage-0 depends on stages: Stage-1
    4. STAGE PLANS:
    5. Stage: Stage-1
    6. Map Reduce
    7. Map Operator Tree:
    8. TableScan
    9. alias: o_fk_eagle_jsd_intopieces_s
    10. Statistics: Num rows: 36940508 Data size: 360575316992 Basic stats: COMPLETE Column stats: NONE
    11. Filter Operator
    12. predicate: ((substr(into_time, 1, 4) >= '2019') and into_id is not null) (type: boolean)
    13. Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONE
    14. Select Operator
    15. expressions: into_id (type: string), into_bus_id (type: string)
    16. outputColumnNames: _col0, _col1
    17. Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONE
    18. Reduce Output Operator
    19. key expressions: _col0 (type: string)
    20. sort order: +
    21. Map-reduce partition columns: _col0 (type: string)
    22. Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONE
    23. value expressions: _col1 (type: string)
    24. TableScan
    25. alias: o_hyd_jsd_loan_order_s
    26. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    27. Filter Operator
    28. predicate: loan_id is not null (type: boolean)
    29. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    30. Select Operator
    31. expressions: loan_id (type: string), customer_type (type: int)
    32. outputColumnNames: _col0, _col1
    33. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    34. Reduce Output Operator
    35. key expressions: _col0 (type: string)
    36. sort order: +
    37. Map-reduce partition columns: _col0 (type: string)
    38. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    39. value expressions: _col1 (type: int)
    40. Reduce Operator Tree:
    41. Join Operator
    42. condition map:
    43. Inner Join 0 to 1
    44. keys:
    45. 0 _col0 (type: string)
    46. 1 _col0 (type: string)
    47. outputColumnNames: _col0, _col1, _col3
    48. Statistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONE
    49. Select Operator
    50. expressions: _col0 (type: string), _col1 (type: string), _col3 (type: int)
    51. outputColumnNames: _col0, _col1, _col2
    52. Statistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONE
    53. File Output Operator
    54. compressed: false
    55. Statistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONE
    56. table:
    57. input format: org.apache.hadoop.mapred.TextInputFormat
    58. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    59. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    60. Stage: Stage-0
    61. Fetch Operator
    62. limit: -1
    63. Processor Tree:
    64. ListSink

    可以看到filter 顾虑部分有比原sql多了关联字段不为空的判断

    (2) left join /full join

    没有相关的过滤空值操作

    2.group by 分组语句会进行排序吗?

    1. explain
    2. select date(create_time),max(actual_amount) max_actual_amount
    3. from dp_ods.o_hyd_jsd_loan_order_s
    4. where etl_date = '2023-09-06'
    5. group by date(create_time)
    1. STAGE DEPENDENCIES:
    2. Stage-1 is a root stage
    3. Stage-2 depends on stages: Stage-1
    4. Stage-0 depends on stages: Stage-2
    5. STAGE PLANS:
    6. Stage: Stage-1
    7. Map Reduce
    8. Map Operator Tree:
    9. TableScan
    10. alias: o_hyd_jsd_loan_order_s
    11. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    12. Select Operator
    13. expressions: CAST( create_time AS DATE) (type: date), actual_amount (type: double)
    14. outputColumnNames: _col0, _col1
    15. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    16. Group By Operator
    17. aggregations: max(_col1)
    18. keys: _col0 (type: date)
    19. mode: hash
    20. outputColumnNames: _col0, _col1
    21. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    22. Reduce Output Operator
    23. key expressions: _col0 (type: date)
    24. sort order: +
    25. Map-reduce partition columns: rand() (type: double)
    26. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    27. value expressions: _col1 (type: double)
    28. Reduce Operator Tree:
    29. Group By Operator
    30. aggregations: max(VALUE._col0)
    31. keys: KEY._col0 (type: date)
    32. mode: partials
    33. outputColumnNames: _col0, _col1
    34. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    35. File Output Operator
    36. compressed: false
    37. table:
    38. input format: org.apache.hadoop.mapred.SequenceFileInputFormat
    39. output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
    40. serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
    41. Stage: Stage-2
    42. Map Reduce
    43. Map Operator Tree:
    44. TableScan
    45. Reduce Output Operator
    46. key expressions: _col0 (type: date)
    47. sort order: +
    48. Map-reduce partition columns: _col0 (type: date)
    49. Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONE
    50. value expressions: _col1 (type: double)
    51. Reduce Operator Tree:
    52. Group By Operator
    53. aggregations: max(VALUE._col0)
    54. keys: KEY._col0 (type: date)
    55. mode: final
    56. outputColumnNames: _col0, _col1
    57. Statistics: Num rows: 22087295 Data size: 42019714676 Basic stats: COMPLETE Column stats: NONE
    58. File Output Operator
    59. compressed: false
    60. Statistics: Num rows: 22087295 Data size: 42019714676 Basic stats: COMPLETE Column stats: NONE
    61. table:
    62. input format: org.apache.hadoop.mapred.TextInputFormat
    63. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    64. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    65. Stage: Stage-0
    66. Fetch Operator
    67. limit: -1
    68. Processor Tree:
    69. ListSink

    可以看到group by 字段是进行了正序排序的,查看sql执行结果也能看到。

  • 相关阅读:
    算法day39|62,63
    设计模式——2_A 访问者(Visitor)
    linux shell中 if else以及大于、小于、等于逻辑表达式
    一种考虑时空关联的深度学习短时交通流预测方法
    SpringCach
    061:vue+openlayer画点、线、圆、多边形( 代码示例 )
    AI智慧安防智能监控平台如何做到健身房智能视频监控?
    Rust个人学习之包&模块
    机器学习之算法优化(一)
    函数重入、函数重载、函数重写自己理解
  • 原文地址:https://blog.csdn.net/BD_fuhong/article/details/132737291