• clickhouse分组排序,行号,取特定数量数据


    目前应用很多需求设计对数据分组并去特定数量的数据;
    clickhouse 新版本增加了row_number(),rank() 函数,可以直接对分组数据添加行号;下面是记录了这两个函数的基本使用;另外用 groupArray方式也大概实现了添加行号(官网上有说 groupArrayLast 可以实现自动排序的功能),但是实测这函数没法使用,所以暂不记录;

    1、源数据

    select 'a' as name, 25 as age,165 as height union all
    select 'b' as name, 21 as age,182 as height union all
    select 'a' as name, 21 as age,187 as height union all
    select 'a' as name, 25 as age,158 as height union all
    select 'b' as name, 22 as age,168 as height
    
    • 1
    • 2
    • 3
    • 4
    • 5

    2、生成数组

    2.1 groupArray 分组合并为数组

    groupArray 会把同类型的值合并为数组,并过滤NULL值数据;格式groupArray(max_size)(fields)

    -- 不限制分组数量
    select name,groupArray(age) from 
    (select 'a' as name, 25 as age,165 as height union all 
     select 'b' as name, 21 as age,182 as height union all 
     select 'a' as name, 21 as age,187 as height union all 
     select 'a' as name, 25 as age,158 as height union all 
     select 'b' as name, 22 as age,168 as height
    ) a group by name;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    namegroupArray(age)
    b[21,22]
    a[25,21,25]
    -- 限制分组数量为1
    -- 如果先对内部数据排序再分组可拿取age最大的一条
    -- arrayStringConcat 将数据用特定字符合并
    select name,groupArray(1)(age),arrayStringConcat(groupArray(1)(age),'') from 
    (select * from 
     (select 'a' as name, 25 as age,165 as height union all 
      select 'b' as name, 21 as age,182 as height union all 
      select 'a' as name, 21 as age,187 as height union all 
      select 'a' as name, 25 as age,158 as height union all 
      select 'b' as name, 22 as age,168 as height
     ) a  
     order by age desc 
    ) r group by name;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    namegroupArray(1)(age)arrayStringConcat(groupArray(1)(age),‘’)
    b[22]22
    a[25]25
    2.2 arrayEnumerate 标记数据
    select name,groupArray(age) as values,arrayEnumerate(values) as indexs from 
    (select 'a' as name, 25 as age,165 as height union all 
     select 'b' as name, 21 as age,182 as height union all 
     select 'a' as name, 21 as age,187 as height union all 
     select 'a' as name, 25 as age,158 as height union all 
     select 'b' as name, 22 as age,168 as height
    ) a group by name;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    namevaluesindexs
    b[21,22][1,2]
    a[25,21,25][1,2,3]

    3、rank()、row_number()

    3.1 说明

    https://clickhouse.com/docs/en/sql-reference/window-functions

    3.2 使用
    --rank()
    select name,age,rank() over(partition by name order by age asc ) from 
    (select 'a' as name, 25 as age,165 as height union all 
     select 'b' as name, 21 as age,182 as height union all 
     select 'a' as name, 21 as age,187 as height union all 
     select 'a' as name, 25 as age,158 as height union all 
     select 'b' as name, 22 as age,168 as height
    ) a group by name,age;
    
    -- row_number()
    select name,age,row_number() over(partition by name order by age asc ) from 
    (select 'a' as name, 25 as age,165 as height union all 
     select 'b' as name, 21 as age,182 as height union all 
     select 'a' as name, 21 as age,187 as height union all 
     select 'a' as name, 25 as age,158 as height union all 
     select 'b' as name, 22 as age,168 as height
    ) a group by name,age;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    nameagerank() OVER (PARTITION BY name ORDER BY age ASC)
    a211
    a252
    b211
    b222

    PS. 行号已经标明,后续需要取多少数据设置行号条件即可

  • 相关阅读:
    外包公司干了2个月,技术倒退两年...
    【算法】数组中出现次数超过一半的数字
    windows使用supervisor-win部署flask项目
    Vue 之 解决v-html生成的元素不能触发@click等vue事件和CSS样式不生效的方法
    AI人工智能进阶-BERT/Transformer/LSTM/RNN原理与代码
    Tensorflow2.4实现RepVGG
    信息系统项目管理师必背核心考点(六十一)项目组合概念
    【解题报告】CF练一下题 | 难度CF2500左右
    咬文嚼图式的介绍二叉树、B树/B-树
    PythonStudy6
  • 原文地址:https://blog.csdn.net/qq_44708990/article/details/133175698