MySQL牛客网刷题3

题目：现在运营想要对每个学校不同性别的用户活跃情况和发帖数量进行分析，请分别计算出每个学校每种性别的用户数、30天内平均活跃天数和平均发帖数量。

用户信息表：user_profile

30天内活跃天数字段（active_days_within_30）

发帖数量字段（question_cnt）

回答数量字段（answer_cnt）

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4.0 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 26 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4在过去的30天里面活跃了7天，发帖数量为2，回答数量为12

。。。

最后一行表示:id为7的用户的常用信息为使用的设备id为4321，性别为男，年龄26岁，复旦大学，gpa为3.6在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

你的查询返回结果需要对性别和学校分组，示例如下，结果保留1位小数，1位小数之后的四舍五入：

gender university user_num avg_active_day avg_question_cnt
male 北京大学 1 7.0 2.0
male 复旦大学 2 12.0 5.5
female 北京大学 1 12.0 3.0
female 浙江大学 1 5.0 1.0
male 山东大学 2 17.5 11.0

解释:

第一行表示：北京大学的男性用户个数为1，平均活跃天数为7天，平均发帖量为2

。。。

最后一行表示：山东大学的男性用户个数为2，平均活跃天数为17.5天，平均发帖量为11

问题分解：

限定条件：无；
每个学校每种性别：按学校和性别分组：group by gender, university
用户数：count(*)
30天内平均活跃天数：avg(active_days_within_30)
平均发帖数量：avg(question_cnt)

细节问题：

表头重命名：as
注意输出顺序，与示例保持一致

完整代码：


select gender,university,count(*) as user_num,
avg(active_days_within_30) as avg_active_day,
avg(question_cnt) as avg_question_cnt
from user_profile
group by gender, university;

SQL19 分组过滤练习题

题目：现在运营想查看每个学校用户的平均发贴和回帖情况，寻找低活跃度学校进行重点运营，请取出平均发贴数低于5的学校或平均回帖数小于20的学校。

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4.0 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 female 26 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4在过去的30天里面活跃了7天，发帖数量为2，回答数量为12
。。。
最后一行表示:id为7的用户的常用信息为使用的设备id为4321，性别为男，年龄26岁，复旦大学，gpa为3.6在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

根据示例，你的查询应返回以下结果，请你保留3位小数(系统后台也会自动校正)，3位之后四舍五入：

university avg_question_cnt avg_answer_cnt
北京大学 2.5000 21.000
浙江大学 1.000 2.000

解释: 平均发贴数低于5的学校或平均回帖数小于20的学校有2个

属于北京大学的用户的平均发帖量为2.500，平均回答数量为21.000

属于浙江大学的用户的平均发帖量为1.000，平均回答数量为2.000

问题分解：

限定条件：平均发贴数低于5或平均回帖数小于20的学校，avg(question_cnt)<5 or avg(answer_cnt)<20，聚合函数结果作为筛选条件时，不能用where，而是用having语法，配合重命名即可；
按学校输出：需要对每个学校统计其平均发贴数和平均回帖数，因此group by university

细节问题：

表头重命名：as
用having不用where!!!!!!!!!!

完整代码：


select university,
avg(question_cnt) as avg_question_cnt,
avg(answer_cnt) as avg_answer_cnt
from user_profile
group by university
having avg_question_cnt<5 or avg_answer_cnt<20;

SQL20 分组排序练习题

题目：现在运营想要查看不同大学的用户平均发帖情况，并期望结果按照平均发帖情况进行升序排列，请你取出相应数据。

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4.0 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 female 26 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4在过去的30天里面活跃了7天，发帖数量为2，回答数量为12
。。。
最后一行表示:id为7的用户的常用信息为使用的设备id为4321，性别为男，年龄26岁，复旦大学，gpa为3.6在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

根据示例，你的查询应返回以下结果：

university avg_question_cnt
浙江大学 1.0000
北京大学 2.5000
复旦大学 5.5000
山东大学 11.0000

问题分解：

限定条件：无；
不同大学：按学校分组group by university
平均发帖数：avg(question_cnt)
升序排序：order by avg_question_cnt

细节问题：

表头重命名：as

完整代码：


select university,
avg(question_cnt) as avg_question_cnt
from user_profile
group by university
order by avg_question_cnt;

SQL25 查找山东大学或者性别为男生的信息

题目：现在运营想要分别查看学校为山东大学或者性别为男性的用户的device_id、gender、age和gpa数据，请取出相应结果，结果不去重。

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 26 复旦大学 3.6 9 6 52

根据示例，你的查询应返回以下结果（注意输出的顺序，先输出学校为山东大学再输出性别为男生的信息）：

device_id gender age gpa
5432 male 25 3.8
2131 male 28 3.3
2138 male 21 3.4
3214 male None 4
5432 male 25 3.8
2131 male 28 3.3
4321 male 28 3.6

问题分解：

限定条件：学校为山东大学或者性别为男性的用户：university='山东大学', gender='male'；
分别查看&结果不去重：所以直接使用两个条件的or是不行的，直接用union也不行，要用union all，分别去查满足条件1的和满足条件2的，然后合在一起不去重

细节问题：

不去重：union all

完整代码：


select device_id,gender	,age,gpa
from user_profile
where university='山东大学'
union all
select device_id,gender	,age,gpa
from user_profile
where gender='male';

SQL22 统计每个学校的答过题的用户的平均答题数

运营想要了解每个学校答过题的用户平均答题数量情况，请你取出数据。

用户信息表 user_profile，其中device_id指终端编号（认为每个用户有唯一的一个终端），gender指性别，age指年龄，university指用户所在的学校，gpa是该用户平均学分绩点，active_days_within_30是30天内的活跃天数。

device_id gender age university gpa active_days_within_30
2138 male 21 北京大学 3.4 7
3214 male NULL 复旦大学 4 15
6543 female 20 北京大学 3.2 12
2315 female 23 浙江大学 3.6 5
5432 male 25 山东大学 3.8 20
2131 male 28 山东大学 3.3 15
4321 male 28 复旦大学 3.6 9

第一行表示:用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4，在过去的30天里面活跃了7天

最后一行表示:用户的常用信息为使用的设备id为4321，性别为男，年龄28岁，复旦大学，gpa为3.6，在过去的30天里面活跃了9天

答题情况明细表 question_practice_detail，其中question_id是题目编号，result是答题结果。

device_id question_id result
2138 111 wrong
3214 112 wrong
3214 113 wrong
6543 111 right
2315 115 right
2315 116 right
2315 117 wrong
5432 118 wrong
5432 112 wrong
2131 114 right
5432 113 wrong

第一行表示用户的常用信息为使用的设备id为2138，在question_id为111的题目上，回答错误

....

最后一行表示用户的常用信息为使用的设备id为5432，在question_id为113的题目上，回答错误

请你写SQL查找每个学校用户的平均答题数目(说明：某学校用户平均答题数量计算方式为该学校用户答题总次数除以答过题的不同用户个数)根据示例，你的查询应返回以下结果（结果保留4位小数），注意：结果按照university升序排序！！！

university avg_answer_cnt
北京大学 1.0000
复旦大学 2.0000
山东大学 2.0000
浙江大学 3.0000

问题分解：

限定条件：无；
每个学校：按学校分组，group by university
平均答题数量：在每个学校的分组内，用总答题数量除以总人数即可得到平均答题数量count(question_id) / count(distinct device_id)。
表连接：学校和答题信息在不同的表，需要做连接

细节问题：

表头重命名：as

完整代码：


select u.university,count(u.device_id) / count(distinct q.device_id) as avg_answer_cnt
from user_profile u,question_practice_detail q
where u.device_id=q.device_id
group by u.university;

SQL23 统计每个学校各难度的用户平均刷题数

题目：运营想要计算一些参加了答题的不同学校、不同难度的用户平均答题量，请你写SQL取出相应数据

用户信息表：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male NULL 复旦大学 4 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 28 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4，在过去的30天里面活跃了7天，发帖数量为2，回答数量为12

最后一行表示:id为7的用户的常用信息为使用的设备id为4321，性别为男，年龄28岁，复旦大学，gpa为3.6，在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

题库练习明细表：question_practice_detail

id device_id question_id result
1 2138 111 wrong
2 3214 112 wrong
3 3214 113 wrong
4
6534
111 right
5 2315 115 right
6 2315 116 right
7 2315 117 wrong
8 5432 117 wrong
9 5432 112 wrong
10 2131 113 right
11 5432 113 wrong
12 2315 115 right
13 2315 116 right
14 2315 117 wrong
15 5432 117 wrong
16 5432 112 wrong
17 2131 113 right
18 5432 113 wrong
19 2315 117 wrong
20 5432 117 wrong
21 5432 112 wrong
22 2131 113 right
23 5432 113 wrong

第一行表示:id为1的用户的常用信息为使用的设备id为2138，在question_id为111的题目上，回答错误

......

最后一行表示:id为23的用户的常用信息为使用的设备id为5432，在question_id为113的题目上，回答错误

表：question_detail

id question_id difficult_level
1 111 hard
2 112 medium
3 113 easy
4 115 easy
5 116 medium
6 117 easy

第一行表示: 题目id为111的难度为hard

....

第一行表示: 题目id为117的难度为easy

请你写一个SQL查询，计算不同学校、不同难度的用户平均答题量，根据示例，你的查询应返回以下结果(结果在小数点位数保留4位，4位之后四舍五入)：

university difficult_level avg_answer_cnt
北京大学 hard 1.0000
复旦大学 easy 1.0000
复旦大学 medium 1.0000
山东大学 easy 4.5000
山东大学 medium 3.0000
浙江大学 easy 5.0000
浙江大学 medium 2.0000

解释：

第一行：北京大学有设备id为2138，6543这2个用户，这2个用户在question_practice_detail表下都只有一条答题记录，且答题题目是111，从question_detail可以知道这个题目是hard，故北京大学的用户答题为hard的题目平均答题为2/2=1.0000

第二行，第三行：复旦大学有设备id为3214，4321这2个用户，但是在question_practice_detail表只有1个用户(device_id=3214有答题，device_id=4321没有答题，不计入后续计算)有2条答题记录，且答题题目是112，113各1个，从question_detail可以知道题目难度分别是medium和easy，故复旦大学的用户答题为easy, medium的题目平均答题量都为1(easy=1或medium=1) /1 (device_id=3214)=1.0000

第四行，第五行：山东大学有设备id为5432和2131这2个用户，这2个用户总共在question_practice_detail表下有12条答题记录，且答题题目是112，113，117，且数目分别为3，6，3，从question_detail可以知道题目难度分别为medium,easy,easy，所以，easy共有9个，故easy的题目平均答题量= 9(easy=9)/2 (device_id=3214 or device_id=5432) =4.5000，medium共有3个，medium的答题只有device_id=5432的用户，故medium的题目平均答题量= 3(medium=9)/1 ( device_id=5432) =3.0000

题分解：

限定条件：无；
每个学校：按学校分组group by university
不同难度：按难度分组group by difficult_level
平均答题数：总答题数除以总人数count(q.question_id) / count(distinct q.device_id)
来自上面信息三个表，需u与q用device_id连接，q与ql用question_id连接。

细节问题：

表头重命名：as
算平均答题数的数据均来源于question_practice_detail，因此在联结时候应保全该表，以该表为本体联结另外两张表

完整代码：


select u.university,ql.difficult_level,count(q.question_id) / count(distinct q.device_id) as avg_answer_cnt
from question_practice_detail q 
#算平均答题数的数据均来源于question_practice_detail，因此在联结时候应保全该表，以该表为本体联结另外两张表
 
left join user_profile u 
on u.device_id = q.device_id
 
left join question_detail ql
on q.question_id = ql.question_id	
 
group by u.university,ql.difficult_level;

SQL24 统计每个用户的平均刷题数

题目：运营想要查看参加了答题的山东大学的用户在不同难度下的平均答题题目数，请取出相应数据

用户信息表：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male NULL 复旦大学 4 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 28 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4，在过去的30天里面活跃了7天，发帖数量为2，回答数量为12

最后一行表示:id为7的用户的常用信息为使用的设备id为432，性别为男，年龄28岁，复旦大学，gpa为3.6，在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

题库练习明细表：question_practice_detail

id device_id question_id result
1 2138 111 wrong
2 3214 112 wrong
3 3214 113 wrong
4
6534
111 right
5 2315 115 right
6 2315 116 right
7 2315 117 wrong
8 5432 117 wrong
9 5432 112 wrong
10 2131 113 right
11 5432 113 wrong
12 2315 115 right
13 2315 116 right
14 2315 117 wrong
15 5432 117 wrong
16 5432 112 wrong
17 2131 113 right
18 5432 113 wrong
19 2315 117 wrong
20 5432 117 wrong
21 5432 112 wrong
22 2131 113 right
23 5432 113 wrong

第一行表示:id为1的用户的常用信息为使用的设备id为2138，在question_id为111的题目上，回答错误

......

最后一行表示:id为23的用户的常用信息为使用的设备id为5432，在question_id为113的题目上，回答错误

表：question_detail

id question_id difficult_level
1 111 hard
2 112 medium
3 113 easy
4 115 easy
5 116 medium
6 117 easy

第一行表示: 题目id为111的难度为hard

....

第一行表示: 题目id为117的难度为easy

请你写一个SQL查询，计算山东、不同难度的用户平均答题量，根据示例，你的查询应返回以下结果(结果在小数点位数保留4位，4位之后四舍五入)：

university difficult_level avg_answer_cnt
山东大学 easy 4.5000
山东大学 medium 3.0000

山东大学有设备id为5432和2131这2个用户，这2个用户总共在question_practice_detail表下有12条答题记录，且答题题目是112，113，117，且数目分别为3，6，3，从question_detail可以知道题目难度分别为medium,easy,easy，所以，easy共有9个，故easy的题目平均答题量= 9(easy=9)/2 (device_id=3214 or device_id=5432) =4.5000，medium共有3个，medium的答题只有device_id=5432的用户，故medium的题目平均答题量= 3(medium=9)/1 ( device_id=5432) =3.0000

问题分解：

限定条件：山东大学的用户 u.university="山东大学"；
不同难度：按难度分组group by difficult_level
平均答题数：总答题数除以总人数count(q.question_id) / count(distinct q.device_id) 来自上面信息三个表，需要联表，u与q用device_id连接并限定大学，q与ql用question_id连接。

细节问题：

表头重命名：as

完整代码：


select u.university, ql.difficult_level,  count(q.question_id) / count(distinct q.question_id	) as avg_answer_cnt
from question_practice_detail q
 
left join user_profile u
on u.device_id=q.device_id
 
left join question_detail ql
on q.question_id=ql.question_id
 
where u.university='山东大学'
group by ql.difficult_level;

SQL21 浙江大学用户题目回答情况

题目：现在运营想要查看所有来自浙江大学的用户题目回答明细情况，请你取出相应数据

示例：question_practice_detail

id device_id question_id result
1 2138 111 wrong
2 3214 112 wrong
3 3214 113 wrong
4 6543 114 right
5 2315 115 right
6 2315 116 right
7 2315 117 wrong

第一行表示:id为1的用户的常用信息为使用的设备id为2138，在question_id为111的题目上，回答错误

....

最后一行表示:id为7的用户的常用信息为使用的设备id为2135，在question_id为117的题目上，回答错误

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4.0 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 female 26 复旦大学 3.6 9 6 52

第一行表示:id为1的用户的常用信息为使用的设备id为2138，性别为男，年龄21岁，北京大学，gpa为3.4在过去的30天里面活跃了7天，发帖数量为2，回答数量为12
。。。
最后一行表示:id为7的用户的常用信息为使用的设备id为4321，性别为男，年龄26岁，复旦大学，gpa为3.6在过去的30天里面活跃了9天，发帖数量为6，回答数量为52

根据示例，你的查询应返回以下结果，查询结果根据question_id升序排序：

解释:

根据题目的数据只有1个浙江大学的用户，那么把浙江大学这个用户所有答题数据查询出来就行

问题分解：

限定条件：来自浙江大学的用户，学校信息在用户画像表，答题情况在用户练习明细表，因此需要通过device_id关联两个表的数据；方法1：join两个表，用inner join，条件是on up.device_id=qpd.device_id and up.university='浙江大学'

完整代码：


select device_id,question_id,result
from question_practice_detail
where device_id=(select device_id from user_profile where university = '浙江大学')

SQL29 计算用户的平均次日留存率

题目：现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。

示例：question_practice_detail

id device_id quest_id result date
1 2138 111 wrong 2021-05-03
2 3214 112 wrong 2021-05-09
3 3214 113 wrong 2021-06-15
4 6543 111 right 2021-08-13
5 2315 115 right 2021-08-13
6 2315 116 right 2021-08-14
7 2315 117 wrong 2021-08-15
……

根据示例，你的查询应返回以下结果：

avg_ret
0.3000

问题分解：

限定条件：第二天再来。
- 表里的数据可以看作是全部第一天来刷题了的，那么我们需要构造出第二天来了的字段，因此可以考虑用left join做自表联结
- 加限制条件 ①on q1.device_id=q2.device_id ②and DATEDIFF(q2.date,q1.date)=1
平均概率：
- 可以count(date1)得到左表全部的date记录数作为分母，count(date2)得到右表关联上了的date记录数作为分子，相除即可得到平均概率

细节问题：

表头重命名：as
去重：需要按照devece_id,date去重，因为一个人一天可能来多次

完整代码：


SELECT COUNT(distinct q2.device_id,q2.date)/ count(DISTINCT q1.device_id,q1.date) as avg_ret
from question_practice_detail as q1
left  join question_practice_detail as q2
on q1.device_id=q2.device_id 
and DATEDIFF(q2.date,q1.date)=1

SQL26 计算25岁以上和以下的用户数量

题目：现在运营想要将用户划分为25岁以下和25岁及以上两个年龄段，分别查看这两个年龄段用户数量

本题注意：age为null 也记为 25岁以下

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 26 复旦大学 3.6 9 6 52

根据示例，你的查询应返回以下结果：

age_cut number
25岁以下 4
25岁及以上 3

代码：


select '25岁以下' as age_cnt, count(device_id) as number
from user_profile
where age<25 or age is null
union all
select '25岁及以上' as age_cnt,count(device_id) as number
from user_profile
where age>=25

SQL27 查看不同年龄段的用户明细

题目：现在运营想要将用户划分为20岁以下，20-24岁，25岁及以上三个年龄段，分别查看不同年龄段用户的明细情况，请取出相应数据。（注：若年龄为空请返回其他。）

示例：user_profile

id device_id gender age university gpa active_days_within_30 question_cnt answer_cnt
1 2138 male 21 北京大学 3.4 7 2 12
2 3214 male 复旦大学 4 15 5 25
3 6543 female 20 北京大学 3.2 12 3 30
4 2315 female 23 浙江大学 3.6 5 1 2
5 5432 male 25 山东大学 3.8 20 15 70
6 2131 male 28 山东大学 3.3 15 7 13
7 4321 male 26 复旦大学 3.6 9 6 52

根据示例，你的查询应返回以下结果：

device_id gender age_cut
2138 male 20-24岁
3214 male 其他
6543 female 20-24岁
2315 female 20-24岁
5432 male 25岁及以上
2131 male 25岁及以上
4321 male 25岁及以上

问题分解：

限定条件：无；
划分年龄段：数值条件判断，可以用多重if，不过更方便的是用case when [expr] then [result1]...else [default] end
附：case when用法

细节问题：

表头重命名：as

完整代码：


select
    device_id,
    gender,
    case
        when age>=25 then '25岁及以上'
        when age>=20 then '20-24岁'
        when age<20 then '20岁以下'
        else '其他'
    end as age_cut
from user_profile

相关阅读:
JVM面试
 论文总结：3D Talking Face With Personalized Pose Dynamics
IT老齐的Redis 6实战课
 vscode安装
 一文讲清 c++ 之队列
 四川思维跳动商务信息咨询有限公司是真的吗？
OpenCV快速入门：直方图、掩膜、模板匹配和霍夫检测
 状态模式和策略模式对比
 设计-命令模式
 Keil5中复制粘贴中文乱码解决
原文地址：https://blog.csdn.net/qq_62799214/article/details/125582247

id	device_id	gender	age	university	gpa	active_days_within_30	question_cnt	answer_cnt
1	2138	male	21	北京大学	3.4	7	2	12
2	3214	male		复旦大学	4.0	15	5	25
3	6543	female	20	北京大学	3.2	12	3	30
4	2315	female	23	浙江大学	3.6	5	1	2
5	5432	male	25	山东大学	3.8	20	15	70
6	2131	male	28	山东大学	3.3	15	7	13
7	4321	male	26	复旦大学	3.6	9	6	52

gender	university	user_num	avg_active_day	avg_question_cnt
male	北京大学	1	7.0	2.0
male	复旦大学	2	12.0	5.5
female	北京大学	1	12.0	3.0
female	浙江大学	1	5.0	1.0
male	山东大学	2	17.5	11.0

university	avg_question_cnt	avg_answer_cnt
北京大学	2.5000	21.000
浙江大学	1.000	2.000

device_id	question_id	result
2138	111	wrong
3214	112	wrong
3214	113	wrong
6543	111	right
2315	115	right
2315	116	right
2315	117	wrong
5432	118	wrong
5432	112	wrong
2131	114	right
5432	113	wrong

id	question_id	difficult_level
1	111	hard
2	112	medium
3	113	easy
4	115	easy
5	116	medium
6	117	easy

id	device_id	quest_id	result	date
1	2138	111	wrong	2021-05-03
2	3214	112	wrong	2021-05-09
3	3214	113	wrong	2021-06-15
4	6543	111	right	2021-08-13
5	2315	115	right	2021-08-13
6	2315	116	right	2021-08-14
7	2315	117	wrong	2021-08-15
……

device_id	gender	age_cut
2138	male	20-24岁
3214	male	其他
6543	female	20-24岁
2315	female	20-24岁
5432	male	25岁及以上
2131	male	25岁及以上
4321	male	25岁及以上