【问题】
I want to get best 3 day of users between “2014-07-01” and “2014-08-01”
Could someone help me? I’ve been stuck here for 3 days.
In real score table entries are 10:00 to 22:00 and 1 entries for each hour.
Total of 12 entry for each day and each player (sometimes it could be less 1 or 2).
This is the output I’m trying to get:
- ID | User_ID | Username | Sum(Score) | Date
- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
- 1| 1| Xxx | 52| 2014-07-01
- 2| 1| Xxx | 143| 2014-07-02
- 3| 2| Yyy | 63| 2014-07-01
- ...
Scoretable:
- ID | User_ID | Score | Datetime
- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
- 1| 1| 35| 2014-07-0111:00:00
- 2| 1| 17| 2014-07-0112:00:00
- 3| 2| 36| 2014-07-0111:00:00
- 4| 2| 27| 2014-07-0112:00:00
- 5| 1| 66| 2014-07-0211:00:00
- 6| 1| 77| 2014-07-0212:00:00
- 7| 2| 93| 2014-07-0212:00:00
- ...
Usertable :
- ID | Username
- \-\-\-\-\-\-\-\-\-\-\-\-\-\-
- 1| Xxx
- 2| Yyy
- 3| Zzz
- ...
别人的回答:
I think you need to aggregate first by date, and then choose the first three usingrow_number(). To do the aggregation:
- selects.user_id,sum(s.datetime,'day')astheday,sum(score)asscore,
- row_number()over(partitionbys.user_id orderbysum(score)desc)asseqnum
- fromscores s
- groupbys.user_id;
- To get the rest of the information, use this as a subquery or CTE:
- selectu.*,s.score
- from(selects.user_id,sum(s.datetime,'day')astheday,sum(s.score)asscore,
- row_number()over(partitionbys.user_id orderbysum(s.score)desc)asseqnum
- fromscores s
- groupbys.user_id
- )s join
- users u
- ons.user_id =u.users_id
- whereseqnum <=3
- orderbyu.user_id,s.score desc;
【回答】
这是个比较典型的组内计算,解决思路很清晰:
1. 将数据按 User_ID 分成多个组,每个组是一个用户的全部数据。
2. 组内运算,将每组数据按日期再分组,并汇总出每日的总分。
3. 组内运算,在每组数据中求得总分前三名的记录。
4. 将所有数据合并。
上述思路虽然清晰,但用 SQL 却很难表达组内运算,所以你“Stuck on Sql query”。这种情况建议采用 SPL 来辅助解决,SPL 可以方便地表达组内运算,代码如下:
A | |
1 | =dataSource.query("select s.ID ID,s.User_ID User_ID,u.Username Username,s.Score Score,s.Datetime Datetime from Scoretable s join Usertable u on s.User_ID = u.ID") |
2 | =A1.group(User_ID) |
3 | =A2.(~.groups(User_ID,Username,date(Datetime):day; sum(Score):sumScore)) |
4 | =A3.(~.top(-3;sumScore)) |
5 | =A4.conj() |
简单解释一下:
1. 上面的“~”代表的是“每组数据”。
2.A2 中的的分组不用聚合,所以用 group 函数。
3.A3 中的分组需要聚合,所以用 groups 函数。
4. 上述代码和 JAVA 或报表很容易集成,
更多内容请参看:【集算器简化 SQL 式计算之组内运算】