目录
第11章 数仓开发之ADS层
11.1 流量主题
11.1.1 各来源流量统计
[atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive &
[atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive --service hiveserver2 &
- [atguigu@node001 hadoop]$ cd /opt/module/hive/hive-3.1.2/
- [atguigu@node001 hive-3.1.2]$ nohup bin/hive &
- [1] 11485
- [atguigu@node001 hive-3.1.2]$ nohup: 忽略输入并把输出追加到"nohup.out"
-
- [atguigu@node001 hive-3.1.2]$ nohup bin/hive --service hiveserver2 &
- [2] 11626
- [atguigu@node001 hive-3.1.2]$ nohup: 忽略输入并把输出追加到"nohup.out"
- [atguigu@node001 hive-3.1.2]$
- [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive &
- [1] 3815
- [atguigu@node001 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
-
- [atguigu@node001 ~]$
- [atguigu@node001 ~]$ jpsall
- ================ node001 ================
- 2848 NameNode
- 3634 JobHistoryServer
- 3014 DataNode
- 3815 RunJar
- 3933 Jps
- 3422 NodeManager
- ================ node002 ================
- 2113 NodeManager
- 2531 Jps
- 1989 ResourceManager
- 1783 DataNode
- ================ node003 ================
- 1908 SecondaryNameNode
- 2055 NodeManager
- 2270 Jps
- 1823 DataNode
- [atguigu@node001 ~]$
- [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive --service hiveserver2 &
- [2] 4038
- [atguigu@node001 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
-
- [atguigu@node001 ~]$
- [atguigu@node001 ~]$ jpsall
- ================ node001 ================
- 2848 NameNode
- 4225 Jps
- 3634 JobHistoryServer
- 4038 RunJar
- 3014 DataNode
- 3815 RunJar
- 3422 NodeManager
- ================ node002 ================
- 2113 NodeManager
- 1989 ResourceManager
- 1783 DataNode
- 2591 Jps
- ================ node003 ================
- 1908 SecondaryNameNode
- 2055 NodeManager
- 2334 Jps
- 1823 DataNode
- [atguigu@node001 ~]$
11.1.2 路径分析
桑基图
11.1.3 各来源下单统计
11.2 用户主题
11.2.1 用户变动统计
11.2.2 用户留存率
11.2.3 用户新增活跃统计
[2023-09-04 10:04:35] org.apache.hadoop.hive.ql.parse.SemanticException:Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 1af1be50-bc68-46e3-a27f-ff9bf04c44c0
- [atguigu@node001 ~]$ cd /opt/module/hive/hive-3.1.2/
- [atguigu@node001 hive-3.1.2]$ bin/hive
-
- hive (default)> set hive.spark.client.server.connect.timeout=300000ms;
- 2023-09-04 10:14:03,044 INFO [fd295ca1-5b9b-4f12-bcff-c5db47bee932 main] conf.HiveConf (HiveConf.java:getLogIdVar(5040)) - Using the default value passed in for log id: fd295ca1-5b9b-4f12-bcff-c5db47bee932
- Query returned non-zero code: 1, cause: Cannot modify hive.spark.client.server.connect.timeout at runtime. It is in the list of parameters that can't be modified at runtime or is prefixed by a restricted variable
- 2023-09-04 10:14:03,091 INFO [fd295ca1-5b9b-4f12-bcff-c5db47bee932 main] conf.HiveConf (HiveConf.java:getLogIdVar(5040)) - Using the default value passed in for log id: fd295ca1-5b9b-4f12-bcff-c5db47bee932
- 2023-09-04 10:14:03,091 INFO [fd295ca1-5b9b-4f12-bcff-c5db47bee932 main] session.SessionState (SessionState.java:resetThreadName(452)) - Resetting thread name to main
- hive (default)>
set hive.spark.client.server.connect.timeout=300000ms;
11.2.4 用户行为漏斗分析
11.2.5 新增交易用户统计
11.2.6 各年龄段下单用户数
11.3 商品主题
11.3.1 各分类课程交易统计
11.3.3 各课程交易统计
11.3.4 各课程评价统计
11.3.5 各分类课程试听留存统计
11.3.6 各学科试听留存统计
11.4 交易主题
11.4.1 交易综合统计
11.4.2 各省份交易统计
11.5 考试主题
11.5.1 各试卷平均统计
11.5.2 各课程考试相关指标统计
11.5.3 各试卷成绩分布
11.5.4 各题正确率
11.6 播放主题
11.6.1 各章节视频播放情况统计
11.6.2 各课程视频播放情况统计
11.7 完课主题
11.7.1 各课程完课人数统计
- -- 11.7 完课主题
- -- 11.7.1 各课程完课人数统计
- -- 1)建表语句
- DROP TABLE IF EXISTS ads_complete_complete_user_count_per_course;
- CREATE EXTERNAL TABLE ads_complete_complete_user_count_per_course
- (
- `dt` STRING COMMENT '统计日期',
- `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
- `course_id` STRING COMMENT '课程 id',
- `user_count` BIGINT COMMENT '完课人数'
- ) COMMENT '各课程完课人数'
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
- LOCATION '/warehouse/edu/ads/ads_complete_complete_user_count_per_course/';
-
-
- -- 2)数据装载
- -- 统计用户对应一个课程完成多少个章节
- insert overwrite table edu2077.ads_complete_complete_user_count_per_course
- select dt,
- recent_days,
- course_id,
- user_count
- from edu2077.ads_complete_complete_user_count_per_course
- union
- select '2022-02-21' dt,
- recent_days,
- course_id,
- count(user_id) order_count
- from (select course_id,
- id,
- user_id,
- max_chapter_complete_date
- from (select course_id,
- user_id,
- max(first_complete_date) max_chapter_complete_date,
- count(first_complete_date) user_chapter_complete_count
- from edu2077.dwd_learn_play_stats_full
- where dt = '2022-02-21'
- group by course_id, user_id
- ) chapter_complete
- left join
- (select id,
- chapter_num
- from edu2077.dim_course_full
- where dt = '2022-02-21') dim_course
- on chapter_complete.course_id = dim_course.id
- where user_chapter_complete_count = chapter_num
- ) course_complete
- lateral view explode(array(1, 7, 30)) tmp as recent_days
- where max_chapter_complete_date >= date_add('2022-02-21', -recent_days + 1)
- group by recent_days, course_id;
-
- select * from ads_complete_complete_user_count_per_course;
-
-
-
- insert overwrite table edu2077.ads_complete_complete_user_count_per_course
- select *
- from edu2077.ads_complete_complete_user_count_per_course
- union
- select '2022-02-21' dt,
- recent_days,
- course_id,
- count(user_id) user_conunt
- from (
- -- 完整的用户数据
- select course_id,
- user_id,
- first_complete_date
- from (
- -- 统计用户对应一个课程完成多少个章节
- select course_id,
- user_id,
- max(first_complete_date) first_complete_date,
- count(first_complete_date) user_complete_nums
- from dwd_learn_play_stats_full
- where dt = '2022-02-21'
- group by course_id, user_id
- ) t1
- left join (
- -- 统计课程一共有多少章节
- select id,
- chapter_num
- from dim_course_full
- where dt = '2022-02-21'
- ) t2
- on t1.course_id = t2.id and user_complete_nums = chapter_num
- ) t3 lateral view explode(array(1, 7, 30)) tmp as recent_days
- where first_complete_date <= '2022-02-21'
- and first_complete_date >= date_sub('2022-02-21', recent_days - 1)
- group by recent_days, course_id;
-
-
- select * from ads_complete_complete_user_count_per_course;
11.7.2 完课综合指标
- -- 11.7.2 完课综合指标
- -- 1)建表语句
- DROP TABLE IF EXISTS ads_complete_complete_stats;
- CREATE EXTERNAL TABLE ads_complete_complete_stats
- (
- `dt` STRING COMMENT '统计日期',
- `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
- `user_complete_count` BIGINT COMMENT '完课人数',
- `user_course_complete_count` BIGINT COMMENT '完课人次'
- ) COMMENT '完课综合指标'
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
- LOCATION '/warehouse/edu/ads/ads_complete_complete_stats/';
-
-
- -- 2)数据装载
- insert overwrite table edu2077.ads_complete_complete_stats
- select dt,
- recent_days,
- user_complete_count,
- user_course_complete_count
- from edu2077.ads_complete_complete_stats
- union
- select '2022-02-21' dt,
- recent_days,
- count(distinct user_id) user_complete_count,
- count(*) user_course_complete_count
- from (select course_id,
- id,
- user_id,
- max_chapter_complete_date
- from (select course_id,
- user_id,
- max(first_complete_date) max_chapter_complete_date,
- count(first_complete_date) user_chapter_complete_count
- from edu2077.dwd_learn_play_stats_full
- where dt = '2022-02-21'
- group by course_id, user_id
- ) chapter_complete
- left join
- (select id,
- chapter_num
- from edu2077.dim_course_full
- where dt = '2022-02-21') dim_course
- on chapter_complete.course_id = dim_course.id
- where user_chapter_complete_count = chapter_num
- ) course_complete
- lateral view explode(array(1, 7, 30)) tmp as recent_days
- where max_chapter_complete_date >= date_add('2022-02-21', -recent_days + 1)
- group by recent_days;
-
-
-
- select * from ads_complete_complete_stats;
-
- insert overwrite table edu2077.ads_complete_complete_stats
- select dt,
- recent_days,
- user_complete_count,
- user_course_complete_count
- from edu2077.ads_complete_complete_stats
- union
- select '2022-02-21' dt,
- recent_days,
- count(distinct user_id) user_complete_count,
- count(*) user_course_complete_count
- from (
- -- 完整的用户数据
- select course_id,
- user_id,
- first_complete_date
- from (
- -- 统计用户对应一个课程完成多少个章节
- select course_id,
- user_id,
- max(first_complete_date) first_complete_date,
- count(first_complete_date) user_complete_nums
- from dwd_learn_play_stats_full
- where dt = '2022-02-21'
- group by course_id, user_id
- ) t1
- left join (
- -- 统计课程一共有多少章节
- select id,
- chapter_num
- from dim_course_full
- where dt = '2022-02-21'
- ) t2
- on t1.course_id = t2.id and user_complete_nums = chapter_num
- ) t3 lateral view explode(array(1, 7, 30)) tmp as recent_days
- where first_complete_date <= '2022-02-21'
- and first_complete_date >= date_sub('2022-02-21', recent_days - 1)
- group by recent_days;
-
- select * from ads_complete_complete_stats;
11.7.3 各个课程人均完成章节数
- -- 11.7.3 各个课程人均完成章节数
- -- 1)建表语句
- DROP TABLE IF EXISTS ads_complete_complete_chapter_count_per_course;
- CREATE EXTERNAL TABLE ads_complete_complete_chapter_count_per_course
- (
- `dt` STRING COMMENT '统计日期',
- `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
- `course_id` STRING COMMENT '课程 id',
- `complete_chapter_count` DECIMAL(16, 2) COMMENT '用户平均完成章节数'
- ) COMMENT '各课程人均完成章节视频数'
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
- LOCATION '/warehouse/edu/ads/ads_complete_complete_chapter_count_per_course/';
-
-
- -- 2)数据装载
- insert overwrite table edu2077.ads_complete_complete_chapter_count_per_course
- select dt,
- recent_days,
- course_id,
- complete_chapter_count
- from edu2077.ads_complete_complete_chapter_count_per_course
- union
- select '2022-02-21' dt,
- recent_days,
- course_id,
- count(first_complete_date) / count(distinct user_id) complete_chapter_count
- from edu2077.dwd_learn_play_stats_full
- lateral view explode(array(1, 7, 30)) tmp as recent_days
- where dt = '2022-02-21'
- and first_complete_date is not null
- and first_complete_date >= date_add('2022-02-21', -recent_days + 1)
- group by recent_days, course_id;
-
-
-
- select * from ads_complete_complete_chapter_count_per_course;
-
- insert overwrite table edu2077.ads_complete_complete_chapter_count_per_course
- select *
- from edu2077.ads_complete_complete_chapter_count_per_course
- union
- select '2022-02-21' dt,
- recent_days,
- course_id,
- cast(count(first_complete_date) / count(distinct user_id) as decimal(16, 2)) complete_chapter_count
- from (
- -- 统计用户对应一个课程完成多少个章节
- -- 完成的章节数
- select user_id,
- course_id,
- max(first_complete_date) first_complete_date,
- count(first_complete_date) user_complete_nums
- from dwd_learn_play_stats_full
- where dt = '2022-02-21'
- group by user_id, course_id
- ) t1 lateral view explode(array(1, 7, 30)) tmp as recent_days
- group by recent_days, course_id;
-
- select * from ads_complete_complete_chapter_count_per_course;
11.8 数据装载脚本