• HIVE/SQL 实现同一列数据累加和累乘


    一、累加

    hive sql 实现同一列数据的累加,相信大家都会,这里就不过多解释了,贴一个例子结束:

    1. with base_data as (
    2. select 1 as fee, '20220101' as dt
    3. union all select 2 as fee, '20220101' as dt
    4. union all select 3 as fee, '20220102' as dt
    5. union all select 4 as fee, '20220102' as dt
    6. union all select 5 as fee, '20220103' as dt
    7. union all select 6 as fee, '20220103' as dt
    8. union all select 7 as fee, '20220104' as dt
    9. union all select 8 as fee, '20220105' as dt
    10. union all select 9 as fee, '20220105' as dt
    11. )
    12. select sum(fee) as fee from base_data

    二、累乘

    仔细一想,hive 好像没有直接对同一列求累乘的函数,这里需要用到高中的数学知识——对数,一起来回忆一下:

    1. 对数定义

    2.对数的性质

    以上是对数的一些运算性质,其中我重点圈出了两个性质,这将是我们使用 hive sql 实现同一列数据累乘的关键

    1)左边的红框中,两个底数(a)相同的对数相加 = 以a为底(N*M)的对数,其中(N*M)就是我们想要的计算结果,应该如何获取(N*M)呢?

    2)看右边红框的性质,我们可以利用这个性质获取(N*M)

    映射到 hive ,可以将同一列的相乘转为同一列的对数相加,在求真数即可;

    具体做法:

    1)先将该列每一个值转为以10为底的对数(底数可随意),再对该列求sum,最后就得到以10为底【该列所有值相加】的结果为真数的对数。记为结果A(利用hive的log()函数)

    2)再对A取真数(利用hive的power()函数)

    用到的 log() 和 power() 两个函数不了解的可以自行百度一下

    写个例子实战一下:

    1. with base_data as (
    2. select 1 as fee, '20220101' as dt
    3. union all select 2 as fee, '20220101' as dt
    4. union all select 3 as fee, '20220102' as dt
    5. union all select 4 as fee, '20220102' as dt
    6. union all select 5 as fee, '20220103' as dt
    7. union all select 6 as fee, '20220103' as dt
    8. union all select 7 as fee, '20220104' as dt
    9. union all select 8 as fee, '20220105' as dt
    10. union all select 9 as fee, '20220105' as dt
    11. )
    12. select sum(fee) as `单列累加`,
    13. power(10, sum(log(10, fee))) as `单列累乘`
    14. from base_data

    结果:

    可以发现,理论上来说 1*2*3*...*9 = 362880,但是为什么结果会是 362879.9999999994呢?

    这是因为在累乘过程中,由于进行了log转换,存在较小精度损失;在真正使用时一般会用round()进行四舍五入处理;

    再看:

    1. with base_data as (
    2. select 1 as fee, '20220101' as dt
    3. union all select 2 as fee, '20220101' as dt
    4. union all select 3 as fee, '20220102' as dt
    5. union all select 4 as fee, '20220102' as dt
    6. union all select 5 as fee, '20220103' as dt
    7. union all select 6 as fee, '20220103' as dt
    8. union all select 7 as fee, '20220104' as dt
    9. union all select 8 as fee, '20220105' as dt
    10. union all select 9 as fee, '20220105' as dt
    11. )
    12. select sum(fee) as `单列累加`,
    13. power(10, sum(log(10, fee))) as `单列累乘`,
    14. round(power(10, sum(log(10, fee)))) as `单列累乘-精度处理`
    15. from base_data

    结果:

     至此,DONE

  • 相关阅读:
    dp练习2
    强化学习 DQN 经验回放 是什么
    Dubbo2.7源码详解
    恶劣条件下GNSS定位的鲁棒统计
    《大话设计模式》学习总结
    云原生|kubernetes |来给生活比个椰---多容器之部署WordPress(多种方式部署)
    PN532开发指南(uart)
    MySQL:函数
    路由与交换技术-17-生成树协议配置
    注意!各国政府纷纷出台的AI安全监管措施,主要集中在六方面
  • 原文地址:https://blog.csdn.net/weixin_43161811/article/details/126859985