• hive和presto的求数组长度函数区别及注意事项


    1、任务

    获取邮箱字符串’@'后字符串 ,求长度

    2、hive & spark-sql 求数组长度的函数 size

    1. hive & spark-sql 求数组长度的函数 size
    2. select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
    3. FROM
    4. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    5. select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
    6. FROM
    7. (select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;
    8. 2 ["tom","126.com.cn"] tom 126.com.cn
    9. 2 ["jack","126.com"] jack 126.com
    10. Time taken: 0.723 seconds, Fetched 2 row(s)

    3、presto  求数组长度的函数 cardinality

    1. presto 求数组长度的函数 cardinality
    2. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    3. FROM
    4. (select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;
    5. _col0 | _col1 | _col2 | _col3
    6. -------+-------------------+-------+------------
    7. 2 | [tom, 126.com.cn] | tom | 126.com.cn
    8. 2 | [jack, 126.com] | jack | 126.com
    9. (2 rows)
    10. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    11. FROM
    12. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    13. Query 20231019_070945_20009_n9u2s failed: line 3:9: Column 'jack@126.com' cannot be resolved
    14. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    15. FROM
    16. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

    4、注意事项

    1)、在计算数组长度的时候,hive和presto的函数不同
      其中hive的size函数默认数组的下标从0开始
      presto的cardinality函数默认数组的下标从1开始

    2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

    1. presto> SELECT
    2. -> email,
    3. -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
    4. -> FROM
    5. -> (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    6. Query 20231016_070153_17958_p9f2s failed: line 5:9: Column 'jack@126.com' cannot be resolved
    7. SELECT
    8. email,
    9. (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
    10. FROM
    11. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

  • 相关阅读:
    【软考软件评测师】第二十章 计算机组成与体系结构(CPU指令系统)
    专业五月考自测
    【Sentinel】Sentinel与gateway的限流算法
    ORB-SLAM2 ---- Frame::UndistortKeyPoints函数
    【MicroPython ESP32】NVS数据非易失性存储示例讲解说明
    工业交换机的“自适应”是什么意思?
    微信小程序Day4笔记
    java实现pdf转word
    下载Ubantu镜像文件、创建虚拟机以及ubantu安装详细教程
    Android JNI笔记
  • 原文地址:https://blog.csdn.net/helloxiaozhe/article/details/133927849