文章目录
已知有表记录了每个品牌的活动开始日期和结束日期,每个品牌可以有多个活动。请编写一个SQL查询合并在同一个品牌举行的所有重叠的活动,如果两个活动至少有一天相同,那他们就是重叠的。
样例数据:
目标结果:
解释:
有两个品牌,分别为小米和华为。
小米:
华为:
我们首先按照brand分组,根据start_date、end_date 升序排列,按照start_date 进行了升序排列,所以当前行的start_date一定晚于前一行的start_date,对当前行的start_date 和截止到上一行的最大end_date进行比较,如果当前行的start_date 小于等于截止到前一行最大end_date 代表有交叉,可以合并,否则代表不可合并。判断出是否可以合并之后,具体操作合并就转化成类似连续问题。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
1、使用max()函数开窗,获得截止到当前行之前的活动最后日期
这里我们使用max(),按照brand进行分组,然后根据start_date和end_date进行排序,利用聚合函数开窗,开窗函数内有排序则聚合到当前行的特性,进行处理。其中我们这边需要聚合到当前行的上一行。
执行SQL
- select
- brand,
- start_date,
- end_date,
- max(end_date) over (partition by brand order by start_date asc,end_date asc rows between unbounded preceding and 1 preceding) as max_end_date
- from t1_brand_event;
2、根据当前行的start_day与max_end_date进行比较,得出是否可以合并标记
执行SQL
- select brand,
- start_date,
- end_date,
- max_end_date,
- if(start_date <= max_end_date, 0, 1) as is_merge --0:合并,1:不合并
- from (select brand,
- start_date,
- end_date,
- max(end_date)
- over (partition by brand order by start_date asc,end_date asc rows between unbounded preceding and 1 preceding) as max_end_date
- from t1_brand_event) t
3、连续问题,使用sum()over()进行分组
执行SQL
- select brand,
- start_date,
- end_date,
- max_end_date,
- is_merge,
- sum(is_merge) over (partition by brand order by start_date asc,end_date asc) as group_id
- from (select brand,
- start_date,
- end_date,
- max_end_date,
- if(start_date <= max_end_date, 0, 1) as is_merge --0:合并,1:不合并
- from (select brand,
- start_date,
- end_date,
- max(end_date)
- over (partition by brand order by start_date asc,end_date asc rows between unbounded preceding and 1 preceding) as max_end_date
- from t1_brand_event) t) tt;
4、完成合并,得到最终结果
取每个组内的start_day 的最小值作为活动开始日期,end_day的最大值作为活动结束日期,得到最终结果。
注意分组条件为:brand+group_id
执行SQL
- select brand,
- min(start_date) as start_date,
- max(end_date) as end_date
- from (select brand,
- start_date,
- end_date,
- max_end_date,
- is_merge,
- sum(is_merge) over (partition by brand order by start_date asc,end_date asc) as group_id
- from (select brand,
- start_date,
- end_date,
- max_end_date,
- if(start_date <= max_end_date, 0, 1) as is_merge --0:合并,1:不合并
- from (select brand,
- start_date,
- end_date,
- max(end_date)
- over (partition by brand order by start_date asc,end_date asc rows between unbounded preceding and 1 preceding) as max_end_date
- from t1_brand_event) t) tt) ttt
- group by brand, group_id; --注意这里的分组,有group_id
- --建表语句
- CREATE TABLE IF NOT EXISTS t1_brand_event (
- brand STRING, --品牌
- start_date STRING, -- 营销活动开始日期
- end_date STRING -- 营销活动结束日期
- );
- --数据插入
- insert into t1_brand_event(brand, start_date, end_date) values
- ('小米','2024-01-13','2024-01-20'),
- ('小米','2024-01-14','2024-01-17'),
- ('小米','2024-01-14','2024-01-16'),
- ('小米','2024-01-18','2024-01-25'),
- ('小米','2024-01-20','2024-01-26'),
- ('华为','2024-11-09','2024-12-07'),
- ('华为','2024-12-09','2024-12-23'),
- ('华为','2024-12-13','2024-12-17'),
- ('华为','2024-12-20','2024-12-24'),
- ('华为','2024-12-25','2024-12-30'),
- ('OPPO','2023-12-01','2024-01-30'),
- ('OPPO','2023-12-31','2024-06-30');