今天开始为期一个多月的 HQL 练习,共 55 道 HQL 题,大概每天两道,从初级函数到中级函数。这次的练习不再是基础的 join 那种通用 SQL 语法了,而是引入了更多 Hive 的函数(单行函数、窗口函数等)。
我会把 HQL 中函数和语法的一些注意事项写在每一题下面的 "知识点" 中,方便上课复习。同样这博客估计没人看,如果谁实在需要建表语句给我留言就行。
- SELECT sku_id from(
- SELECT sku_id,
- rank() OVER(ORDER BY order_sum desc) rk
- from(
- SELECT sku_id,sum(sku_num) order_sum
- FROM order_detail
- GROUP BY sku_id
- ORDER BY order_sum desc
- LIMIT 2
- )as t1
- )as t2
- WHERE rk=2;
知识点:
- select distinct sku_id, sku_num,rk from(
- ...
- );
- -- 尽管查询结果中 sku_id 字段的值可能重复,但是不能通 select distinct 来对单个属性去重
- sku_id sku_num rk
- 1 2 1
- 1 3 2
order_detail_id | order_id | sku_id | create_date | price | sku_num |
---|---|---|---|---|---|
1 | 1 | 1 | 2021-09-27 | 2000.00 | 2 |
2 | 1 | 3 | 2021-09-27 | 5000.00 | 5 |
3 | 2 | 4 | 2021-09-28 | 6000.00 | 9 |
4 | 2 | 5 | 2021-09-28 | 500.00 | 33 |
- -- 1.1 2021年销售总量小于100的商品
- select sku_id, sum(sku_num) order_sum
- from order_detail
- where year(create_date)=2021
- group by sku_id
- having order_sum<100;
- -- 1.2 上架时间小于 30 天的商品
- select sku_id,name from sku_info
- where datediff('2022-01-10',from_date)>30;
- -- join 两个子表
- select t1.sku_id,name from (
- select sku_id, sum(sku_num) order_sum
- from order_detail
- where year(create_date)=2021
- group by sku_id
- having order_sum<100
- )t1 join (
- select sku_id,name from sku_info
- where datediff('2022-01-10',from_date)>30
- )t2 on t1.sku_id = t2.sku_id;
知识点:
uer_id | ip_address | login_ts | logout_ts |
---|---|---|---|
101 | 180.149.130.161 | 2021-09-21 08:00:00 | 2021-09-27 08:30:00 |
101 | 180.149.130.161 | 2021-09-27 08:00:00 | 2021-09-27 08:30:00 |
101 | 180.149.130.161 | 2021-09-28 09:00:00 | 2021-09-28 09:10:00 |
101 | 180.149.130.161 | 2021-09-29 13:30:00 | 2021-09-29 13:50:00 |
思路1:每天有多少人是首日登录就有多少新增用户。查询出每个用户的首日登录时间,然后按照日期分组聚合就得到了每日新增用户。而不是去考虑开窗(我是这么想的)
思路2:开窗也可以实现,用 row_numer 对每个用户的登录时间进行排名(group by user_id),然后根据登录时间进行分区将该天 row_number=1 的值(说明是首次登录)进行聚合。
- -- 查询用户首次登录的日期
- select user_id,min(date_format(login_ts,'yyyy-MM-dd')) first_login_date
- from user_login_detail
- group by user_id;
- -- 按照日期分组得到每天的新增用户
- select first_login_date,count(*) from(
- select user_id,min(date_format(login_ts,'yyyy-MM-dd')) first_login_date
- from user_login_detail
- group by user_id
- )t1
- group by first_login_date;
注意:怎么把 login_ts (格式:2021-09-21 08:00:00)这种时间字符串指定的字段取出来?
我是这么实现的:
select concat_ws('-',string(year(date_format(login_ts,'yyyy-MM-dd HH:mm:ss'))),string(month(date_format(login_ts,'yyyy-MM-dd HH:mm:ss'))),string(day(date_format(login_ts,'yyyy-MM-dd HH:mm:ss')))),
标准:
select date_format(login_ts,'yyyy-MM-dd') from user_login_detail;
- select dt,sum(`if`(rk=1,1,0)) new_user_nums from(
- select user_id,
- date_format(login_ts,'yyyy-MM-dd') dt,
- row_number() over (partition by user_id order by login_ts) rk
- from user_login_detail
- )t1
- group by dt
- having new_user_nums>0;
需求:从用户登录明细表(user_login_detail)和订单信息表(order_info)中查询每个用户的注册日期(首次登录日期)、总登录次数,以及2021年的登录次数、订单数和订单总额。
思路:无脑 join 没有什么难度
order_info:
序号 | 编号 | 日期 | 金额 |
---|---|---|---|
1 | 101 | 2021-09-27 | 29000.00 |
2 | 101 | 2021-09-28 | 70500.00 |
3 | 101 | 2021-09-29 | 43300.00 |
4 | 101 | 2021-09-30 | 860.00 |
user_login_detail:
- -- 用户首日登录日期
- select user_id,min(date_format(login_ts,'yyyy-MM-dd')) register_date
- from user_login_detail
- group by user_id;
注意:能 group by 就 group by 不然 join 之后报错。
- -- 用户累积登录次数
- select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) total_login_count
- from user_login_detail
- group by user_id;
知识点: 利用 collect_set() 把登录日期收集到一个集合里,正好做了去重,就不用担心用户一天登录多次的情况了。
- -- 用户2021登录次数
- select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) login_count_2021
- from user_login_detail
- where year(date_format(login_ts,'yyyy-MM-dd'))=2021
- group by user_id;
- -- 用户2021年下单次数和下单金额
- select user_id,count(order_id) order_count_2021,sum(total_amount) order_amount_2021
- from order_info
- where year(create_date)=2021
- group by user_id,year(create_date);
- select t1.user_id,register_date,total_login_count,login_count_2021,order_count_2021,order_amount_2021 from(
- select user_id,min(date_format(login_ts,'yyyy-MM-dd')) register_date from user_login_detail group by user_id
- )t1 join (
- select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) total_login_count
- from user_login_detail
- group by user_id
- )t2 on t1.user_id=t2.user_id
- join (
- select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) login_count_2021
- from user_login_detail
- where year(date_format(login_ts,'yyyy-MM-dd'))=2021
- group by user_id
- )t3 on t1.user_id=t3.user_id
- join (
- select user_id,count(order_id) order_count_2021,sum(total_amount) order_amount_2021
- from order_info
- where year(create_date)=2021
- group by user_id,year(create_date)
- )t4 on t1.user_id=t4.user_id;
需求:请向所有用户推荐其朋友收藏但是自己未收藏的商品,从好友关系表(friendship_info)和收藏表(favor_info)中查询出应向哪位用户推荐哪些商品。
firendship_info:
user1_id | user2_id |
---|---|
101 | 1010 |
101 | 108 |
101 | 106 |
101 | 104 |
favor_info:
user_id | sku_id | create_date |
---|---|---|
101 | 3 | 2021-09-23 |
101 | 12 | 2021-09-23 |
101 | 6 | 2021-09-25 |
101 | 10 | 2021-09-21 |
思路:
- -- 查询所有用户的好友
- select user1_id user_id,user2_id friend_id from friendship_info
- union
- select user2_id,user1_id from friendship_info;
知识点:
- -- join得到用户好友收藏的商品
- select user1_id user_id,user2_id friend_id from friendship_info
- union
- select user2_id,user1_id from friendship_info
- join favor_info firend_favor
- on user2_id=firend_favor.user_id;
- select distinct t1.user_id,firend_favor.sku_id
- from (
- select user1_id user_id,user2_id friend_id from friendship_info
- union
- select user2_id,user1_id from friendship_info
- )t1
- join favor_info firend_favor
- on t1.friend_id=firend_favor.user_id
- left join favor_info user_favor
- on t1.user_id=user_favor.user_id and firend_favor.sku_id=user_favor.sku_id
- where user_favor.sku_id is null;
需求:从订单信息表(order_info)和用户信息表(user_info)中,分别统计每天男性和女性用户的订单总金额,如果当天男性或者女性没有购物,则统计结果为0。
order_info:
user_info:
编号 | 性别 | 出生日期 |
---|---|---|
101 | 男 | 1990-01-01 |
102 | 女 | 1991-02-01 |
103 | 女 | 1992-03-01 |
104 | 男 | 1993-04-01 |
- select t2.gender,t1.create_date,t1.total_amount
- from order_info t1
- join user_info t2 on t1.user_id=t2.user_id
我们没有必要查询用户的 id 信息,只需要性别(后面我们需要根据性别过滤)、创建订单的日期(后面我们需要根据日期分组)和订单总额(我们需要根据不同性别统计每天的订单总额)即可。
- select coalesce(t3.create_date,t4.create_date),`if`(t3.total_amount_male is null,0,t3.total_amount_male),`if`(t4.total_amount_female is null ,0,t4.total_amount_female) from(
- select create_date,sum(total_amount) total_amount_male from(
- select t2.gender,t1.create_date,t1.total_amount
- from order_info t1
- join user_info t2 on t1.user_id=t2.user_id
- )t1
- where gender='男'
- group by create_date
- )t3 full join (
- select create_date,sum(total_amount) total_amount_female from(
- select t2.gender,t1.create_date,t1.total_amount
- from order_info t1
- join user_info t2 on t1.user_id=t2.user_id
- )t2
- where gender='女'
- group by create_date
- )t4 on t3.create_date=t4.create_date
知识点:
思路1是我自己实现的一种方式,思路2是答案,不得不说还是这种写法高级:
- select create_date,
- cast(sum(`if`(gender='男',total_amount,0)) as decimal(16,2)) total_amount_male,
- cast(sum(`if`(gender='女',total_amount,0)) as decimal(16,2)) total_amount_female
- from order_info oi
- join user_info ui on oi.user_id=ui.user_id
- group by create_date;
知识点:
cast(expr as
decimal(精度,标度):比如 decimal(16,2)表示一个十进制数,其中16是总的数字数量(精度),而2是小数点后的数字数量(标度)