• sql优化实操


    以下是同事人员发过来的sql,让我帮他优化一下,clickhouse的语句,需要替换成在自己的数据库上执行,dth_diagnosis_all 这个表的数据量90217505 ,千万级的数据量,说是内存不足,sql执行不了。

    1. select sum(case when re.diagnosis_code = 'E10' then 1 else 0 end) as E10Count,
    2. sum(case when re.diagnosis_code = 'E11' then 1 else 0 end) as E11Count,
    3. sum(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then 1 else 0 end) as otherCount
    4. from (select t1.id_no as id_no, left(t1.diagnosis_code,3) as diagnosis_code
    5. from (select t2.id_no, t2.org_code, t2.event_type_code, t2.org_treatment_no,
    6. max(t2.register_date) as register_date,t2.diagnosis_code
    7. from dth.dth_diagnosis_all t2
    8. where t2.register_date <= #{initYear}
    9. and left(t2.diagnosis_code,3) in('E10','E11','E12','E13','E14')
    10. group by t2.id_no,t2.org_code,t2.event_type_code,t2.org_treatment_no,t2.diagnosis_code)t3,
    11. dth.dth_diagnosis_all t1
    12. where t1.id_no = t3.id_no
    13. and t1.org_code = t3.org_code and t1.diagnosis_code = t3.diagnosis_code
    14. and t1.event_type_code = t3.event_type_code
    15. and t1.org_treatment_no = t3.org_treatment_no
    16. and t1.register_date = t3.register_date
    17. and t1.org_province = #{province}
    18. and t1.org_city = #{city}
    19. )re left join (select * from dth_ehr_pir_all
    20. where birth_date <= #{initYear}
    21. and register_province = #{province}
    22. and register_city = #{city}
    23. and diabetes_flag != '0'
    24. and (death_date = ''
    25. or death_date > #{initYear})) t
    26. on t.id_no = re.id_no
    27. where t.birth_date <= #{initYear}
    28. and t.register_province = #{province}
    29. and t.register_city = #{city}
    30. and t.diabetes_flag != '0'
    31. and (t.death_date = '' or t.death_date > #{initYear})
    32. 参数:initYear(查询年份1231,如2029-12-31), #{province},查询省份:(山东省), #{city} :查询城市

    两个子查询的数据量

     以下为优化后的sql,看这个sql子查询t1和t3,其实是同一张表,然后对同一张表做连接查询,对t3表做了where条件过滤(这个不用改),之后连接查询之后又对t1表进行了where条件过滤,如下:       

    t1.org_province = #{province}
    and t1.org_city = #{city}

    为啥不提前提到t1子查询里面去呢,

    再看一下join条件中

    and t1.diagnosis_code = t3.diagnosis_code

    and t1.register_date = t3.register_date,

    说明t3和t1连接条件是一样的,又是同一张表,t3中也对这两个列做了where限制,那么子查询t1中也可以加入这两个条件,于是把一下四个条件提到t1的子查询中去,大大降低了临时表占用的内存空间,

    t1.org_province = #{province}
    and t1.org_city = #{city}

    and register_date <= #{initYear}
    and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')

    1. (
    2. select *
    3. from dth.dth_diagnosis_all
    4. where
    5. org_province = #{province} and org_city = #{city}
    6. and register_date <= #{initYear}
    7. and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')
    8. ) t1

    然后就是针对 t 表做优化,子查询中已经对 t 表做了where条件过滤,连接查询后面又对 t 表进行单独的过滤,这个完全没必要再加同样的条件了。

    之后再看最外层的

       sum(case when re.diagnosis_code = 'E10' then 1 else 0 end) as E10Count,
       sum(case when re.diagnosis_code = 'E11' then 1 else 0 end) as E11Count,
       sum(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then 1 else 0 end) as otherCount

    这个一看就是对条件进行统计个数,可以改成效率更高的count

    改下如下:

        count(case when re.diagnosis_code = 'E10' then '1' else null end) as E10Count,
        count(case when re.diagnosis_code = 'E11' then '1' else null end) as E11Count,
        count(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then '1' else null end) as otherCount

    至此改完之后的结果

    1. select
    2. count(case when re.diagnosis_code = 'E10' then '1' else null end) as E10Count,
    3. count(case when re.diagnosis_code = 'E11' then '1' else null end) as E11Count,
    4. count(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then '1' else null end) as otherCount
    5. from
    6. (
    7. select
    8. t1.id_no as id_no,
    9. left(t1.diagnosis_code,3) as diagnosis_code
    10. from
    11. (
    12. select
    13. t2.id_no, t2.org_code, t2.event_type_code, t2.org_treatment_no,
    14. max(t2.register_date) as register_date,
    15. t2.diagnosis_code
    16. from dth.dth_diagnosis_all t2
    17. where
    18. t2.register_date <= #{initYear}
    19. and left(t2.diagnosis_code,3) in('E10','E11','E12','E13','E14')
    20. group by
    21. t2.id_no,t2.org_code,t2.event_type_code,t2.org_treatment_no,t2.diagnosis_code
    22. ) t3
    23. inner join
    24. (
    25. select *
    26. from dth.dth_diagnosis_all
    27. where
    28. org_province = #{province} and org_city = #{city}
    29. and register_date <= #{initYear}
    30. and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')
    31. ) t1
    32. on
    33. t1.id_no = t3.id_no
    34. and t1.org_code = t3.org_code
    35. and t1.diagnosis_code = t3.diagnosis_code
    36. and t1.event_type_code = t3.event_type_code
    37. and t1.org_treatment_no = t3.org_treatment_no
    38. and t1.register_date = t3.register_date
    39. ) re
    40. left join
    41. (
    42. select *
    43. from dth_ehr_pir_all
    44. where
    45. birth_date <= #{initYear}
    46. and register_province = #{province}
    47. and register_city = #{city}
    48. and diabetes_flag != '0'
    49. and (death_date = '' or death_date > #{initYear})
    50. ) t
    51. on t.id_no = re.id_no;

    sql能跑出来了,不会报出内存不足了

  • 相关阅读:
    单例模式之懒汉式和饿汉式
    国产麒麟v10系统下打包electron+vue程序,报错unknown output format set
    LCP 51.烹饪料理
    Vue3常用笔记
    Springboot整合JWT完成验证登录
    react--编程式导航、antd的使用
    为什么OpenAPI是未来企业数字化转型的决定性因素
    服装工业新消费·PLM
    虹科分享|硬件加密U盘|居家办公的网络安全:远程员工可以采取的步骤
    【LeetCode回溯算法#08】递增子序列,巩固回溯算法中的去重问题
  • 原文地址:https://blog.csdn.net/weixin_43084715/article/details/125423526