• 间隔连续问题


    输入数据:

    CREATE TABLE lianxu(
    uid int,
    dt string)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘|’
    LINES TERMINATED BY ‘\n’;

    id dt
    1001|2021-12-12
    1002|2021-12-12
    1001|2021-12-13
    1001|2021-12-14
    1001|2021-12-16
    1002|2021-12-16
    1001|2021-12-19
    1002|2021-12-17
    1001|2021-12-20
    代码:

    思路:分组

    1)将上一行时间数据下移
    select
    uid,
    dt,
    lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
    from lianxu; t1

    2)将当前行时间减去上一行时间数据
    select
    uid,
    dt,
    datediff(dt,lag_dt) as diff_date
    from (
    select
    uid,
    dt,
    lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
    from lianxu
    )t1; t2

    uid dt diff_date
    1001 2021-12-12 18973
    1001 2021-12-13 1
    1001 2021-12-14 1
    1001 2021-12-16 2
    1001 2021-12-19 3
    1001 2021-12-20 1
    1002 2021-12-12 18973
    1002 2021-12-16 4
    1002 2021-12-17 1

    3)按照用户分组,同时按照时间排序,计算从第一行到当前行大于2的数据的总条数(sum(if(flag>2,1,0)))

    select
    uid,
    dt,
    diff_date,
    sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
    from (
    select
    uid,
    dt,
    datediff(dt,lag_dt) as diff_date
    from (
    select
    uid,
    dt,
    lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
    from lianxu
    )t1
    )t2; t3

    uid dt diff_date groupid
    1001 2021-12-12 18973 1
    1001 2021-12-13 1 1
    1001 2021-12-14 1 1
    1001 2021-12-16 2 1
    1001 2021-12-19 3 2
    1001 2021-12-20 1 2
    1002 2021-12-12 18973 1
    1002 2021-12-16 4 2
    1002 2021-12-17 1 2

    4)按照用户和flag分组,求最大时间减去最小时间并加上1
    select
    uid,
    groupid,
    datediff(max(dt),min(dt)) + 1 as days
    from (
    select
    uid,
    dt,
    diff_date,
    sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
    from (
    select
    uid,
    dt,
    datediff(dt,lag_dt) as diff_date
    from (
    select
    uid,
    dt,
    lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
    from lianxu
    )t1
    )t2
    )t3
    group by uid,groupid; t4

    uid groupid days
    1001 1 5
    1001 2 2
    1002 1 1
    1002 2 2

    5)取连续登录天数的最大值
    select
    uid,
    max(days) as max_dt
    from (
    select
    uid,
    groupid,
    datediff(max(dt),min(dt)) + 1 as days
    from (
    select
    uid,
    dt,
    diff_date,
    sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
    from (
    select
    uid,
    dt,
    datediff(dt,lag_dt) as diff_date
    from (
    select
    uid,
    dt,
    lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
    from lianxu
    )t1
    )t2
    )t3
    group by uid,groupid
    )t4
    group by uid;

    结果:
    OK
    uid max_dt
    1001 5
    1002 2
    Time taken: 3.842 seconds, Fetched: 2 row(s)

  • 相关阅读:
    MySQL Server 和 MySQL Workbench安装
    selenium框架操作stealth.min.js文件隐藏浏览器指纹特征
    百趣土壤非靶标代谢组学文献分享,来自Microbiome的灵感
    你所不知道的WMS
    测试行业3年经验,从大厂裸辞后,面试阿里、字节全都一面挂,被面试官说我的水平还不如应届生
    浅析ES6六种声明变量的方法
    使用rpm重新安装包
    10驾校科目一考试系统——窗口交互
    SpringBoot篇之集成Jedis、Lettuce、Redisson
    Leetcode算法二叉树—117. 填充每个节点的下一个右侧节点指针 II(层序遍历/队列)
  • 原文地址:https://blog.csdn.net/a2011480169/article/details/126131629