SQL架构
表 Accounts:
+---------------+---------+ | Column Name | Type | +---------------+---------+ | id | int | | name | varchar | +---------------+---------+ id 是该表主键. 该表包含账户 id 和账户的用户名.
表 Logins:
+---------------+---------+ | Column Name | Type | +---------------+---------+ | id | int | | login_date | date | +---------------+---------+ 该表无主键, 可能包含重复项. 该表包含登录用户的账户 id 和登录日期. 用户也许一天内登录多次.
写一个 SQL 查询, 找到活跃用户的 id 和 name.
活跃用户是指那些至少连续 5 天登录账户的用户.
返回的结果表按照 id 排序.
结果表格式如下例所示:
Accounts 表: +----+----------+ | id | name | +----+----------+ | 1 | Winston | | 7 | Jonathan | +----+----------+ Logins 表: +----+------------+ | id | login_date | +----+------------+ | 7 | 2020-05-30 | | 1 | 2020-05-30 | | 7 | 2020-05-31 | | 7 | 2020-06-01 | | 7 | 2020-06-02 | | 7 | 2020-06-02 | | 7 | 2020-06-03 | | 1 | 2020-06-07 | | 7 | 2020-06-10 | +----+------------+ Result 表: +----+----------+ | id | name | +----+----------+ | 7 | Jonathan | +----+----------+ id = 1 的用户 Winston 仅仅在不同的 2 天内登录了 2 次, 所以, Winston 不是活跃用户. id = 7 的用户 Jonathon 在不同的 6 天内登录了 7 次, , 6 天中有 5 天是连续的, 所以, Jonathan 是活跃用户.
- select
- distinct ss1.id, a.name
- from
- (
- select
- id,count(distinct login_date) c
- from
- (
- select
- id,login_date,date_sub(login_date,interval dense_rank() over(partition by id order by login_date) day) drank
- from
- Logins
- ) s1
- group by id,drank
- having c >=5
- ) ss1 left join Accounts a
- on ss1.id = a.id
- order by id
- select
- ss1.id, a.name
- from
- (
- select
- distinct id
- from
- (
- select
- id,login_date,date_sub(login_date,interval dense_rank() over(partition by id order by login_date) day) drank
- from
- Logins
- ) s1
- group by id,drank
- having count(distinct login_date) >=5
- ) ss1 left join Accounts a
- on ss1.id = a.id
- order by id
笔记:
活跃 用户 去重
一定要用 date_sub
不可直接 日期(login_date) - dense_rank
日期直接这么减法会出错滴