优化group By查询很慢的问题建议使用DISTINCT字段做分组的查询优化

1、前言问题描述

前端发生查询一张表大约250W+的数据常量表，每次全量查询，一点击就全量查询，导致的查询很慢，刚开始，我想的是前端先做下限制，不要每一次弹窗就去全表查询，后续我抽时间把他们重构放在Redis上面，可是前端有意见，所以后端做优化，那就只能加班进行SQL的优化，其他的工作影响了进度，那么在这里记录下是如何使用呢SQL优化的解决，

select * from 表 group By 表字段；

表如果字段很多，100W+的时候呢，查询也还凑乎，可是同步数据到250W+的时候数据量，发现查询满了下来，大约10秒+这就哟点坑啦！

2、原来的初始方案：

主要是直接的进行分组查询，毕竟业务数据上是一对多的问题，所以在这里也是直接的进行分组了，如果是全表分组的话，会根据数据量的大小大致查询很慢！


 @Override
    @DataSource(DataSourceType.HW_BUSINESS)
    public IPage selectByGroupBySourceName(String dsSourceName, Integer dsSourceType, Integer current, Integer size) {
 
        Page page = new Page<>(current, size);
        LambdaQueryWrapper lqw = new LambdaQueryWrapper<>();
 
        if (StringUtils.isNotBlank(dsSourceName)) {
            lqw.like(HwDatasource::getDsSourceName, dsSourceName);
        }
        if (dsSourceType != null && dsSourceType != 0) {
            lqw.eq(HwDatasource::getDsSourceType, dsSourceType);
        }
        //group BY  ds_source_name  出现汉字重复现象
        lqw.last(" group BY  ds_source_type ");
        return hwDatasourceMapper.selectPage(page, lqw);
    }

简单地优化：


-- 添加索引
alter  table  hw_business.hw_datasource  add  index  idx_ds_source_type(ds_source_type);

表结构是这样的


 
CREATE TABLE `hw_datasource` (
  `id` bigint NOT NULL,
  `ds_code` int DEFAULT NULL COMMENT '数据源编码',
  `ds_source_type` int DEFAULT NULL COMMENT '数据源类型',
  `ds_source_name` varchar(200) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '数据源名称',
  `ds_news_columns` varchar(200) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT '新闻栏目名称',
  `update_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  `create_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `collect_rate` int DEFAULT '0' COMMENT '采集频率（单位秒）',
  PRIMARY KEY (`id`),
  KEY `ds_code_index` (`ds_code`) USING BTREE,
  KEY `idx_ds_source_type` (`ds_source_type`),
  KEY `idx_ds_source_type_name` (`ds_source_type`,`ds_source_name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='数据源信息表';

交互也是有问题，没有查询呢每一次调用都是全表查询，感觉真的很鸡肋和有问题的交互！初步解决方案是交互调整下，不要每一次都要全表查询，无意义的。只要指定条件那么这个条件字段就会生效，有索引会很快！但是貌似没人认可，必须优化！

3、优化的方案

思路：

主要是取消原来的语句查询，因为group By 默认是原来的Mybatis-plus的分页查询，导致分页很慢！同时group By 分组也不是用来全表分组查询的！

不能借助Mybatis-plus的底层解析SQL运行，只能手写SQL进行优化，也就是手写分页，自己查询俩句SQL，原本的Mybatis-plus会直接的解析成一句SQL：


SELECT COUNT(1) FROM ( 
SELECT id, ds_code, ds_source_type,
 ds_source_name, ds_news_columns, update_time, create_time 
FROM hw_business.hw_datasource 
GROUP BY ds_source_type ) TOTAL

此时需要自己手写俩句SQL，

1、是分页自己进行查询，

2、分组查询转换

group BY ds_source_type; 改成 DISTINCT ds_source_type

本来的SQL改造如下


 
explain
select count(DISTINCT ds_source_type) from hw_business.hw_datasource  ;
select DISTINCT ds_source_type ,ds_source_name 
 from hw_business.hw_datasource  limit   1,100;
 
 
SELECT DISTINCT ds_source_type , ds_source_name
 from  hw_business.hw_datasource ht limit 100, 100;
 
SELECT DISTINCT ds_source_type , ds_source_name ,ds_news_columns
 from hw_business.hw_datasource ht
 WHERE ht.ds_source_type = 500030400 and
ht.ds_source_name like CONCAT('%','微信','%') limit 0, 100

自定义分页源码自己实现如下，替换掉分页的功能

自定义实现分页：这个分页逻辑也是一个面试问题呢。


 @Override
    @DataSource(DataSourceType.HW_BUSINESS)
    public IPage selectByGroupBySourceName(String dsSourceName, Integer dsSourceType, Integer current, Integer size) {
 
        Page page = new Page<>(current, size);
        HwDatasourceVo hwDatasourceVo = new HwDatasourceVo();
        page.setCurrent(current);
        page.setSize(size);
        hwDatasourceVo.setDsSourceName(dsSourceName);
        hwDatasourceVo.setDsSourceType(dsSourceType);
        hwDatasourceVo.setCurrent(current);
        hwDatasourceVo.setEndOffset(size);
        hwDatasourceVo.setStartOffset(current);
        if (current > 1) {
            int startOffset = (current - 1) * size;
            hwDatasourceVo.setStartOffset(startOffset);
            hwDatasourceVo.setEndOffset(size);
        }
        if (current == 1) {
            hwDatasourceVo.setStartOffset(0);
        }
        List hwDatasourceList = hwDatasourceMapper.selectHwDatasourcList(hwDatasourceVo);
        Long count = hwDatasourceMapper.selectHwDatasourcCount(hwDatasourceVo);
        page.setRecords(hwDatasourceList);
        page.setTotal(count);
        return page;
    }

Mybatis对应的映射的xml

自定义的SQL如下：


 
 
 "1.0" encoding="UTF-8"?>
mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.dao.mapper.business.HwDatasourceMapper">
    <sql id="BaseQueryWhere">
        <where>
            <if test="dsSourceType != null and dsSourceType != '0'.toString">
                and    ht.ds_source_type = #{dsSourceType}
            if>
            <if test="dsSourceName != null">
                and    ht.ds_source_name  like  CONCAT('%',#{dsSourceName},'%')
            if>
 
        where>
    sql>
 
    <select id="selectHwDatasourcCount" resultType="Long"
            parameterType="com.api.vo.middleground.HwDatasourceVo">
        SELECT COUNT(DISTINCT ds_source_type) from  hw_datasource ht
        <include refid="BaseQueryWhere"/>
    select>
 
    <select id="selectHwDatasourcList" resultType="com.dao.entity.business.HwDatasource"
            parameterType="com.api.vo.middleground.HwDatasourceVo">
        SELECT  DISTINCT ds_source_type  ,   ds_source_name  ,ds_news_columns  from  hw_datasource ht
        <include refid="BaseQueryWhere"/>
        <if test="startOffset != null and endOffset != null">
            limit #{startOffset}, #{endOffset}
        if>
    select>
mapper>

BaseMapper的接口定义如下：


    /**
     * @param dsNewsColumns
     * @return
     */
    @Select(" SELECT COUNT( DISTINCT ds_source_type ) id,ds_code,ds_source_type,ds_source_name,ds_news_columns,update_time,create_time FROM hw_datasource WHERE ds_source_name = #{dsNewsColumns} ")
    HwDatasource selectListOnlyOne(@Param("dsNewsColumns") String dsNewsColumns);
//
//    @Select(" SELECT COUNT( DISTINCT ds_source_type ) id,ds_code,ds_source_type,ds_source_name,ds_news_columns,update_time,create_time FROM hw_datasource WHERE ds_source_name like  CONCAT('%',#{dsNewsColumns},'%')  ")
//    HwDatasource selectListByDsSourceName(@Param("dsSourceName") String dsSourceName,@Param("startOffset") String dsSourceName);
//
//    @Select(" SELECT COUNT( DISTINCT ds_source_type ) id,ds_code,ds_source_type,ds_source_name,ds_news_columns,update_time,create_time FROM hw_datasource WHERE ds_source_type = #{dsSourceType} ")
//    HwDatasource selectListByDsSourceType(@Param("dsSourceType") String dsSourceType);
//
//    @Select(" SELECT COUNT( DISTINCT ds_source_type ) id,ds_code,ds_source_type,ds_source_name,ds_news_columns,update_time,create_time FROM hw_datasource WHERE ds_source_name    CONCAT('%',#{dsNewsColumns},'%')  and ds_source_type = #{dsSourceType}  ")
//    HwDatasource selectListByDsSourceTypeAndDsSourceName(@Param("dsSourceName") String dsSourceName,@Param("dsSourceType") String dsSourceType);
 
 
    Long selectHwDatasourcCount(HwDatasourceVo hwDatasourceVo);
 
    List selectHwDatasourcList(HwDatasourceVo hwDatasourceVo);

相关阅读:
立体相机标定
 深度学习基础-1
uniapp的app苹果应用商店上架最简教程
 第1关：ZooKeeper初体验
 Node 中的 Buffer 的理解？应用场景？
Python中class内置方法__init__与__new__作用与区别探究
 Linux系统编程之进程间通信(IPC)
Ansible 的脚本 --- playbook 剧本
 自学WEB后端02-基于Express框架完成一个交互留言板!
精简5800三维程序
原文地址：https://blog.csdn.net/m0_59252007/article/details/127594617