• hive统计页面停留时间


    1、背景:通过业务埋点数据,统计用户在页面的停留时间

    样例数据,样例数据存入表tmp,

    有如下字段用户uid、动作时间戳time、页面名称pn、动作名称action

    1. SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS action
    2. UNION ALL
    3. SELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS action
    4. UNION ALL
    5. SELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS action
    6. UNION ALL
    7. SELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS action
    8. UNION ALL
    9. SELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS action
    10. UNION ALL
    11. SELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action
    uidtimepnaction
    123451695613731020搜索click
    123451695613732021搜索click
    123451695613734024搜索click
    123451695613737036列表click
    123451695613738037列表click
    123451695613740040列表click

    思路:以用户维度按时间进行升序排列,通过lag函数找到上一个时间动作last_pn

    1. SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
    2. , lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
    3. FROM tmp

    rn排序的作用是找到最后一个动作

    uidtimepnrnlast_pn
    123451695613731020搜索6
    123451695613732021搜索5搜索
    123451695613734024搜索4搜索
    123451695613737036列表3搜索
    123451695613738037列表2列表
    123451695613740040列表1列表

    然后将发生页面变化的节点进行标记,

    1. SELECT *, if(pn <> nvl(last_pn, '空') OR rn = 1, 1, 0) AS label
    2. FROM (
    3. SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
    4. , lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
    5. FROM tmp
    6. ) t
    uidtimepnrnlast_pnlabel
    123451695613731020搜索61
    123451695613732021搜索5搜索0
    123451695613734024搜索4搜索0
    123451695613737036列表3搜索1
    123451695613738037列表2列表0
    123451695613740040列表1列表1

    之后统计停留时间就可以只看label =1的日志之间的时间差即可,全部代码如下,

    1. WITH tmp AS (
    2. SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS action
    3. UNION ALL
    4. SELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS action
    5. UNION ALL
    6. SELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS action
    7. UNION ALL
    8. SELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS action
    9. UNION ALL
    10. SELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS action
    11. UNION ALL
    12. SELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action
    13. )
    14. SELECT *
    15. FROM (
    16. SELECT uid, pn, time, CAST((lead(time, 1) OVER (PARTITION BY uid ORDER BY time ASC) - time) / 1000 AS BIGINT) AS stay_time
    17. FROM (
    18. SELECT *
    19. , if(pn <> nvl(last_pn, '空')
    20. OR rn = 1, 1, 0) AS label
    21. FROM (
    22. SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
    23. , lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
    24. FROM tmp
    25. ) t
    26. ) tt
    27. WHERE label = 1
    28. ) ttt
    29. WHERE stay_time IS NOT NULL

    最终统计结果如下

    uidpntimestay_time
    12345搜索16956137310206
    12345列表16956137370363
  • 相关阅读:
    FPGA中应用LVDS信号
    【Linux私房菜】第五期 —— 定时任务与磁盘
    笔试刷题Day—8
    为什么nginx用户使用的nginx不能监听80端口?
    大数据学习笔记1.3 Linux用户操作
    Android DEX相关,ART加载OAT文件
    校园一卡通管理信息系统的设计与实现
    甲氨蝶呤修饰PEG-PLGA纳米粒MTX-PLGA|载紫杉醇PTX的PLGA纳米粒PLGA-SS-PTX(定制)
    1704. 判断字符串的两半是否相似
    uniapp-vue3-oadmin手机后台实例|vite5.x+uniapp多端仿ios管理系统
  • 原文地址:https://blog.csdn.net/u012735708/article/details/133688621