• 数据仓库搭建ADS层


    本篇只是ADS层,其他内容请关注我的博客!在<项目>专栏里!!!

    本篇文章参考尚硅谷大数据项目写成!

    目录

    搭建ADS层

    一、设备主题

    1.1活跃设备数(日、周、月)

    1.2 每日新增设备

    1.3留存率

    1.4沉默用户数

    1.5本周回流用户数

    1.6流失用户数

    1.7最近连续三周活跃用户数

    1.8最近七天内连续三天活跃用户数

    二、会员主题

    2.1会员信息

    2.2漏斗分析

    三、商品主题

    3.1商品主题

    3.2商品营销排名

    3.3商品收藏排名

    3.4商品加入购物车排名

    3.5商品退款率排名(30天)

    3.6商品差评率

    四、营销主题

    4.1下单数目统计

    4.2支付信息统计

    五、地区主题

    六、ADS层数据导入脚本


    搭建ADS层

    ADS层不涉及建模,建表根据具体需求而定。

    一、设备主题

    1.1活跃设备数(日、周、月)

    需求定义:

    日活:当日活跃的设备数

    周活:当周活跃的设备数

    月活:当月活跃的设备数

    1)建表语句

    1. create external table ads_uv_count(
    2.     `dt` string COMMENT '统计日期',
    3.     `day_count` bigint COMMENT '当日用户数量',
    4.     `wk_count` bigint COMMENT '当周用户数量',
    5.     `mn_count` bigint COMMENT '当月用户数量',
    6.     `is_weekend` string COMMENT 'Y,N是否是周末,用于得到本周最终结果',
    7.     `is_monthend` string COMMENT 'Y,N是否是月末,用于得到本月最终结果'
    8. ) COMMENT '活跃设备数'
    9. row format delimited fields terminated by '\t'
    10. location '/warehouse/gmall/ads/ads_uv_count/';

    2)导入数据

    1. insert into table ads_uv_count
    2. select
    3. '2022-05-20' dt,
    4. daycount.ct,
    5. wkcount.ct,
    6. mncount.ct,
    7. if(date_add(next_day('2022-05-20','MO'),-1)='2022-05-20','Y','N') ,
    8. if(last_day('2022-05-20')='2022-05-20','Y','N')
    9. from
    10. (
    11. select
    12. '2022-05-20' dt,
    13. count(*) ct
    14. from dwt_uv_topic
    15. where login_date_last='2022-05-20'
    16. )daycount join
    17. (
    18. select
    19. '2022-05-20' dt,
    20. count (*) ct
    21. from dwt_uv_topic
    22. where login_date_last>=date_add(next_day('2022-05-20','MO'),-7)
    23. and login_date_last<= date_add(next_day('2022-05-20','MO'),-1)
    24. ) wkcount on daycount.dt=wkcount.dt
    25. join
    26. (
    27. select
    28. '2022-05-20' dt,
    29. count (*) ct
    30. from dwt_uv_topic
    31. where date_format(login_date_last,'yyyy-MM')=date_format('2022-05-20','yyyy-MM')
    32. )mncount on daycount.dt=mncount.dt;

    3)查询导入结果

    select * from ads_uv_count;

    1.2 每日新增设备

    1)建表语句

    1. create external table ads_new_mid_count
    2. (
    3. `create_date` string comment '创建时间' ,
    4. `new_mid_count` BIGINT comment '新增设备数量'
    5. ) COMMENT '每日新增设备数量'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_new_mid_count/';

    2)导入数据

    1. insert into table ads_new_mid_count
    2. select
    3. '2022-05-20',
    4. count(*)
    5. from dwt_uv_topic
    6. where login_date_first='2022-05-20';

    3)查询导入结果

    select * from ads_new_mid_count;

    1.3留存率

    1)建表语句

    1. create external table ads_user_retention_day_rate
    2. (
    3. `stat_date` string comment '统计日期',
    4. `create_date` string comment '设备新增日期',
    5. `retention_day` int comment '截止当前日期留存天数',
    6. `retention_count` bigint comment '留存数量',
    7. `new_mid_count` bigint comment '设备新增数量',
    8. `retention_ratio` decimal(16,2) comment '留存率'
    9. ) COMMENT '留存率'
    10. row format delimited fields terminated by '\t'
    11. location '/warehouse/gmall/ads/ads_user_retention_day_rate/';

    2)导入数据

    1. insert into table ads_user_retention_day_rate
    2. select
    3. '2022-05-20',
    4. date_add('2022-05-20',-1),
    5. 1,--留存天数
    6. sum(if(login_date_first=date_add('2022-05-20',-1) and login_date_last='2022-05-20',1,0)),
    7. sum(if(login_date_first=date_add('2022-05-20',-1),1,0)),
    8. sum(if(login_date_first=date_add('2022-05-20',-1) and login_date_last='2022-05-20',1,0))/sum(if(login_date_first=date_add('2022-05-20',-1),1,0))*100
    9. from dwt_uv_topic
    10. union all
    11. select
    12. '2022-05-20',
    13. date_add('2022-05-20',-2),
    14. 2,
    15. sum(if(login_date_first=date_add('2022-05-20',-2) and login_date_last='2022-05-20',1,0)),
    16. sum(if(login_date_first=date_add('2022-05-20',-2),1,0)),
    17. sum(if(login_date_first=date_add('2022-05-20',-2) and login_date_last='2022-05-20',1,0))/sum(if(login_date_first=date_add('2022-05-20',-2),1,0))*100
    18. from dwt_uv_topic
    19. union all
    20. select
    21. '2022-05-20',
    22. date_add('2022-05-20',-3),
    23. 3,
    24. sum(if(login_date_first=date_add('2022-05-20',-3) and login_date_last='2022-05-20',1,0)),
    25. sum(if(login_date_first=date_add('2022-05-20',-3),1,0)),
    26. sum(if(login_date_first=date_add('2022-05-20',-3) and login_date_last='2022-05-20',1,0))/sum(if(login_date_first=date_add('2022-05-20',-3),1,0))*100
    27. from dwt_uv_topic;

    3)查询导入结果

    select * from ads_user_retention_day_rate;

    1.4沉默用户数

    1)建表语句

    1. create external table ads_silent_count(
    2. `dt` string COMMENT '统计日期',
    3. `silent_count` bigint COMMENT '沉默设备数'
    4. ) COMMENT '沉默用户数'
    5. row format delimited fields terminated by '\t'
    6. location '/warehouse/gmall/ads/ads_silent_count';

    2)导入数据(2022-05-20)

    1. insert into table ads_silent_count
    2. select
    3. '2022-05-20',
    4. count(*)
    5. from dwt_uv_topic
    6. where login_date_first=login_date_last
    7. and login_date_last<=date_add('2022-05-20',-7);

    3)查询导入结果

    select * from ads_silent_count;

    1.5本周回流用户数

    需求定义:

    本周回流用户:上周未活跃,本周活跃的设备,且不是本周新增设备

    1)建表语句

    1. create external table ads_back_count(
    2. `dt` string COMMENT '统计日期',
    3. `wk_dt` string COMMENT '统计日期所在周',
    4. `wastage_count` bigint COMMENT '回流设备数'
    5. ) COMMENT '本周回流用户数'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_back_count';

    2)导入数据

    1. insert into table ads_back_count
    2. select
    3. '2022-05-20',
    4. concat(date_add(next_day('2022-05-20','MO'),-7),'_', date_add(next_day('2022-05-20','MO'),-1)),
    5. count(*)
    6. from
    7. (
    8. select
    9. mid_id
    10. from dwt_uv_topic
    11. where login_date_last>=date_add(next_day('2022-05-20','MO'),-7)
    12. and login_date_last<= date_add(next_day('2022-05-20','MO'),-1)
    13. and login_date_first<date_add(next_day('2022-05-20','MO'),-7)
    14. )current_wk
    15. left join
    16. (
    17. select
    18. mid_id
    19. from dws_uv_detail_daycount
    20. where dt>=date_add(next_day('2022-05-20','MO'),-7*2)
    21. and dt<= date_add(next_day('2022-05-20','MO'),-7-1)
    22. group by mid_id
    23. )last_wk
    24. on current_wk.mid_id=last_wk.mid_id
    25. where last_wk.mid_id is null;

    3)查询导入结果

    select * from ads_back_count;

    1.6流失用户数

    需求定义:

    流失用户:最近7天未活跃的设备

    1)建表语句

    1. create external table ads_wastage_count(
    2. `dt` string COMMENT '统计日期',
    3. `wastage_count` bigint COMMENT '流失设备数'
    4. ) COMMENT '流失用户数'
    5. row format delimited fields terminated by '\t'
    6. location '/warehouse/gmall/ads/ads_wastage_count';

    2)导入数据(2022-05-20)

    1. insert into table ads_wastage_count
    2. select
    3. '2022-05-20',
    4. count(*)
    5. from
    6. (
    7. select
    8. mid_id
    9. from dwt_uv_topic
    10. where login_date_last<=date_add('2022-05-20',-7)
    11. group by mid_id
    12. )t1;

    3)查询导入结果

    select * from ads_wastage_count;

    1.7最近连续三周活跃用户数

    1)建表语句

    1. create external table ads_continuity_wk_count(
    2. `dt` string COMMENT '统计日期,一般用结束周周日日期,如果每天计算一次,可用当天日期',
    3. `wk_dt` string COMMENT '持续时间',
    4. `continuity_count` bigint COMMENT '活跃用户数'
    5. ) COMMENT '最近连续三周活跃用户数'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_continuity_wk_count';

    2)导入数据(2022-05-20所在周)

    1. insert into table ads_continuity_wk_count
    2. select
    3. '2022-05-20',
    4. concat(date_add(next_day('2022-05-20','MO'),-7*3),'_',date_add(next_day('2022-05-20','MO'),-1)),
    5. count(*)
    6. from
    7. (
    8. select
    9. mid_id
    10. from
    11. (
    12. select
    13. mid_id
    14. from dws_uv_detail_daycount
    15. where dt>=date_add(next_day('2022-05-20','monday'),-7)
    16. and dt<=date_add(next_day('2022-05-20','monday'),-1)
    17. group by mid_id
    18. union all
    19. select
    20. mid_id
    21. from dws_uv_detail_daycount
    22. where dt>=date_add(next_day('2022-05-20','monday'),-7*2)
    23. and dt<=date_add(next_day('2022-05-20','monday'),-7-1)
    24. group by mid_id
    25. union all
    26. select
    27. mid_id
    28. from dws_uv_detail_daycount
    29. where dt>=date_add(next_day('2022-05-20','monday'),-7*3)
    30. and dt<=date_add(next_day('2022-05-20','monday'),-7*2-1)
    31. group by mid_id
    32. )t1
    33. group by mid_id
    34. having count(*)=3
    35. )t2;

    3)查询导入结果

    select * from ads_continuity_wk_count;

    1.8最近七天内连续三天活跃用户数

    1)建表语句

    1. create external table ads_continuity_uv_count(
    2. `dt` string COMMENT '统计日期',
    3. `wk_dt` string COMMENT '最近7天日期',
    4. `continuity_count` bigint
    5. ) COMMENT '最近七天内连续三天活跃用户数'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_continuity_uv_count';

    2)导入数据

    1. insert into table ads_continuity_uv_count
    2. select
    3. '2022-05-20',
    4. concat(date_add('2022-05-20',-6),'_','2022-05-20'),
    5. count(*)
    6. from
    7. (
    8. select mid_id
    9. from
    10. (
    11. select mid_id
    12. from
    13. (
    14. select
    15. mid_id,
    16. date_sub(dt,rank) date_dif
    17. from
    18. (
    19. select
    20. mid_id,
    21. dt,
    22. rank() over(partition by mid_id order by dt) rank
    23. from dws_uv_detail_daycount
    24. where dt>=date_add('2022-05-20',-6) and dt<='2022-05-20'
    25. )t1
    26. )t2
    27. group by mid_id,date_dif
    28. having count(*)>=3
    29. )t3
    30. group by mid_id
    31. )t4;

    3)查询导入结果

    select * from ads_continuity_uv_count;

    二、会员主题

    2.1会员信息

    1)建表语句

    1. create external table ads_user_topic(
    2. `dt` string COMMENT '统计日期',
    3. `day_users` string COMMENT '活跃会员数',
    4. `day_new_users` string COMMENT '新增会员数',
    5. `day_new_payment_users` string COMMENT '新增消费会员数',
    6. `payment_users` string COMMENT '总付费会员数',
    7. `users` string COMMENT '总会员数',
    8. `day_users2users` decimal(16,2) COMMENT '会员活跃率',
    9. `payment_users2users` decimal(16,2) COMMENT '会员付费率',
    10. `day_new_users2users` decimal(16,2) COMMENT '会员新鲜度'
    11. ) COMMENT '会员信息表'
    12. row format delimited fields terminated by '\t'
    13. location '/warehouse/gmall/ads/ads_user_topic';

    2)导入数据

    1. insert into table ads_user_topic
    2. select
    3. '2022-05-20',
    4. sum(if(login_date_last='2022-05-20',1,0)),
    5. sum(if(login_date_first='2022-05-20',1,0)),
    6. sum(if(payment_date_first='2022-05-20',1,0)),
    7. sum(if(payment_count>0,1,0)),
    8. count(*),
    9. sum(if(login_date_last='2022-05-20',1,0))/count(*),
    10. sum(if(payment_count>0,1,0))/count(*),
    11. sum(if(login_date_first='2022-05-20',1,0))/sum(if(login_date_last='2022-05-20',1,0))
    12. from dwt_user_topic;

    3)查询导入结果

    select * from ads_user_topic;

    2.2漏斗分析

    统计“浏览首页->浏览商品详情页->加入购物车->下单->支付”的转化率

    思路:统计各个行为的人数,然后计算比值。

    1)建表语句

    1. create external table ads_user_action_convert_day(
    2. `dt` string COMMENT '统计日期',
    3. `home_count` bigint COMMENT '浏览首页人数',
    4. `good_detail_count` bigint COMMENT '浏览商品详情页人数',
    5. `home2good_detail_convert_ratio` decimal(16,2) COMMENT '首页到商品详情转化率',
    6. `cart_count` bigint COMMENT '加入购物车的人数',
    7. `good_detail2cart_convert_ratio` decimal(16,2) COMMENT '商品详情页到加入购物车转化率',
    8. `order_count` bigint COMMENT '下单人数',
    9. `cart2order_convert_ratio` decimal(16,2) COMMENT '加入购物车到下单转化率',
    10. `payment_amount` bigint COMMENT '支付人数',
    11. `order2payment_convert_ratio` decimal(16,2) COMMENT '下单到支付的转化率'
    12. ) COMMENT '漏斗分析'
    13. row format delimited fields terminated by '\t'
    14. location '/warehouse/gmall/ads/ads_user_action_convert_day/';

    2)导入数据

    1. with
    2. tmp_uv as
    3. (
    4. select
    5. '2022-05-20' dt,
    6. sum(if(array_contains(pages,'home'),1,0)) home_count,
    7. sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    8. from
    9. (
    10. select
    11. mid_id,
    12. collect_set(page_id) pages
    13. from dwd_page_log
    14. where dt='2022-05-20'
    15. and page_id in ('home','good_detail')
    16. group by mid_id
    17. )tmp
    18. ),
    19. tmp_cop as
    20. (
    21. select
    22. '2022-05-20' dt,
    23. sum(if(cart_count>0,1,0)) cart_count,
    24. sum(if(order_count>0,1,0)) order_count,
    25. sum(if(payment_count>0,1,0)) payment_count
    26. from dws_user_action_daycount
    27. where dt='2022-05-20'
    28. )
    29. insert into table ads_user_action_convert_day
    30. select
    31. tmp_uv.dt,
    32. tmp_uv.home_count,
    33. tmp_uv.good_detail_count,
    34. tmp_uv.good_detail_count/tmp_uv.home_count*100,
    35. tmp_cop.cart_count,
    36. tmp_cop.cart_count/tmp_uv.good_detail_count*100,
    37. tmp_cop.order_count,
    38. tmp_cop.order_count/tmp_cop.cart_count*100,
    39. tmp_cop.payment_count,
    40. tmp_cop.payment_count/tmp_cop.order_count*100
    41. from tmp_uv
    42. join tmp_cop
    43. on tmp_uv.dt=tmp_cop.dt;

    3)查询导入结果

    select * from ads_user_action_convert_day;

    三、商品主题

    3.1商品主题

    1)建表语句

    1. create external table ads_product_info(
    2. `dt` string COMMENT '统计日期',
    3. `sku_num` string COMMENT 'sku个数',
    4. `spu_num` string COMMENT 'spu个数'
    5. ) COMMENT '商品个数信息'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_product_info';

    2)导入数据

    1. insert into table ads_product_info
    2. select
    3. '2022-05-20' dt,
    4. sku_num,
    5. spu_num
    6. from
    7. (
    8. select
    9. '2022-05-20' dt,
    10. count(*) sku_num
    11. from
    12. dwt_sku_topic
    13. ) tmp_sku_num
    14. join
    15. (
    16. select
    17. '2022-05-20' dt,
    18. count(*) spu_num
    19. from
    20. (
    21. select
    22. spu_id
    23. from
    24. dwt_sku_topic
    25. group by
    26. spu_id
    27. ) tmp_spu_id
    28. ) tmp_spu_num
    29. on tmp_sku_num.dt=tmp_spu_num.dt;

    3)查询导入结果

    select * from ads_product_info;

     

    3.2商品营销排名

    1)建表语句

    1. create external table ads_product_sale_topN(
    2. `dt` string COMMENT '统计日期',
    3. `sku_id` string COMMENT '商品ID',
    4. `payment_amount` bigint COMMENT '销量'
    5. ) COMMENT '商品销量排名'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_product_sale_topN';

    2)导入数据

    1. insert into table ads_product_sale_topN
    2. select
    3. '2022-05-20' dt,
    4. sku_id,
    5. payment_amount
    6. from
    7. dws_sku_action_daycount
    8. where
    9. dt='2022-05-20'
    10. order by payment_amount desc
    11. limit 10;

    3)查询导入结果

    select * from ads_product_sale_topN;

    3.3商品收藏排名

    1)建表语句

    1. create external table ads_product_favor_topN(
    2. `dt` string COMMENT '统计日期',
    3. `sku_id` string COMMENT '商品ID',
    4. `favor_count` bigint COMMENT '收藏量'
    5. ) COMMENT '商品收藏排名'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_product_favor_topN';

     2)导入数据

    1. insert into table ads_product_favor_topN
    2. select
    3. '2022-05-20' dt,
    4. sku_id,
    5. favor_count
    6. from
    7. dws_sku_action_daycount
    8. where
    9. dt='2022-05-20'
    10. order by favor_count desc
    11. limit 10;

    3)查询导入结果

    select * from ads_product_favor_topN;

    3.4商品加入购物车排名

    1)建表语句

    1. create external table ads_product_cart_topN(
    2. `dt` string COMMENT '统计日期',
    3. `sku_id` string COMMENT '商品ID',
    4. `cart_count` bigint COMMENT '加入购物车次数'
    5. ) COMMENT '商品加入购物车排名'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_product_cart_topN';

    2)导入数据

    1. insert into table ads_product_cart_topN
    2. select
    3. '2022-05-20' dt,
    4. sku_id,
    5. cart_count
    6. from
    7. dws_sku_action_daycount
    8. where
    9. dt='2022-05-20'
    10. order by cart_count desc
    11. limit 10;

    3)查询导入结果

    select * from ads_product_cart_topN;

    3.5商品退款率排名(30天)

    1)建表语句

    1. create external table ads_product_refund_topN(
    2. `dt` string COMMENT '统计日期',
    3. `sku_id` string COMMENT '商品ID',
    4. `refund_ratio` decimal(16,2) COMMENT '退款率'
    5. ) COMMENT '商品退款率排名'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_product_refund_topN';

    2)导入数据

    1. insert into table ads_product_refund_topN
    2. select
    3. '2022-05-20',
    4. sku_id,
    5. refund_last_30d_count/payment_last_30d_count*100 refund_ratio
    6. from dwt_sku_topic
    7. order by refund_ratio desc
    8. limit 10;

    3)查询导入结果

    select * from ads_product_refund_topN;

    3.6商品差评率

    1)建表语句

    1. create external table ads_appraise_bad_topN(
    2. `dt` string COMMENT '统计日期',
    3. `sku_id` string COMMENT '商品ID',
    4. `appraise_bad_ratio` decimal(16,2) COMMENT '差评率'
    5. ) COMMENT '商品差评率'
    6. row format delimited fields terminated by '\t'
    7. location '/warehouse/gmall/ads/ads_appraise_bad_topN';

    2)导入数据

    1. insert into table ads_appraise_bad_topN
    2. select
    3. '2022-05-20' dt,
    4. sku_id,
    5. appraise_bad_count/(appraise_good_count+appraise_mid_count+appraise_bad_count+appraise_default_count) appraise_bad_ratio
    6. from
    7. dws_sku_action_daycount
    8. where
    9. dt='2022-05-20'
    10. order by appraise_bad_ratio desc
    11. limit 10;

    3)查询导入结果

    select * from ads_appraise_bad_topN;

    四、营销主题

    4.1下单数目统计

    需求分析:统计每日下单数,下单金额及下单用户数。

    1)建表语句

    1. create external table ads_order_daycount(
    2. dt string comment '统计日期',
    3. order_count bigint comment '单日下单笔数',
    4. order_amount bigint comment '单日下单金额',
    5. order_users bigint comment '单日下单用户数'
    6. ) comment '下单数目统计'
    7. row format delimited fields terminated by '\t'
    8. location '/warehouse/gmall/ads/ads_order_daycount';

    2)导入数据

    1. insert into table ads_order_daycount
    2. select
    3. '2022-05-20',
    4. sum(order_count),
    5. sum(order_amount),
    6. sum(if(order_count>0,1,0))
    7. from dws_user_action_daycount
    8. where dt='2022-05-20';

    3)查询导入结果

    select * from ads_order_daycount;

    4.2支付信息统计

    1)建表语句

    1. create external table ads_payment_daycount(
    2. dt string comment '统计日期',
    3. order_count bigint comment '单日支付笔数',
    4. order_amount bigint comment '单日支付金额',
    5. payment_user_count bigint comment '单日支付人数',
    6. payment_sku_count bigint comment '单日支付商品数',
    7. payment_avg_time decimal(16,2) comment '下单到支付的平均时长,取分钟数'
    8. ) comment '支付信息统计'
    9. row format delimited fields terminated by '\t'
    10. location '/warehouse/gmall/ads/ads_payment_daycount';

    2)导入数据

    1. insert into table ads_payment_daycount
    2. select
    3. tmp_payment.dt,
    4. tmp_payment.payment_count,
    5. tmp_payment.payment_amount,
    6. tmp_payment.payment_user_count,
    7. tmp_skucount.payment_sku_count,
    8. tmp_time.payment_avg_time
    9. from
    10. (
    11. select
    12. '2022-05-20' dt,
    13. sum(payment_count) payment_count,
    14. sum(payment_amount) payment_amount,
    15. sum(if(payment_count>0,1,0)) payment_user_count
    16. from dws_user_action_daycount
    17. where dt='2022-05-20'
    18. )tmp_payment
    19. join
    20. (
    21. select
    22. '2022-05-20' dt,
    23. sum(if(payment_count>0,1,0)) payment_sku_count
    24. from dws_sku_action_daycount
    25. where dt='2022-05-20'
    26. )tmp_skucount on tmp_payment.dt=tmp_skucount.dt
    27. join
    28. (
    29. select
    30. '2022-05-20' dt,
    31. sum(unix_timestamp(payment_time)-unix_timestamp(create_time))/count(*)/60 payment_avg_time
    32. from dwd_fact_order_info
    33. where dt='2022-05-20'
    34. and payment_time is not null
    35. )tmp_time on tmp_payment.dt=tmp_time.dt;

    3)查询导入结果

     select * from ads_payment_daycount;

    4.3品牌复购率统计

    1)建表语句

    1. create external table ads_sale_tm_category1_stat_mn
    2. (
    3. tm_id string comment '品牌id',
    4. category1_id string comment '1级品类id ',
    5. category1_name string comment '1级品类名称 ',
    6. buycount bigint comment '购买人数',
    7. buy_twice_last bigint comment '两次以上购买人数',
    8. buy_twice_last_ratio decimal(16,2) comment '单次复购率',
    9. buy_3times_last bigint comment '三次以上购买人数',
    10. buy_3times_last_ratio decimal(16,2) comment '多次复购率',
    11. stat_mn string comment '统计月份',
    12. stat_date string comment '统计日期'
    13. ) COMMENT '品牌复购率统计'
    14. row format delimited fields terminated by '\t'
    15. location '/warehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';

    2)导入数据

    1. with
    2. tmp_order as
    3. (
    4. select
    5. user_id,
    6. order_stats_struct.sku_id sku_id,
    7. order_stats_struct.order_count order_count
    8. from dws_user_action_daycount lateral view explode(order_detail_stats) tmp as order_stats_struct
    9. where date_format(dt,'yyyy-MM')=date_format('2022-05-20','yyyy-MM')
    10. ),
    11. tmp_sku as
    12. (
    13. select
    14. id,
    15. tm_id,
    16. category1_id,
    17. category1_name
    18. from dwd_dim_sku_info
    19. where dt='2022-05-20'
    20. )
    21. insert into table ads_sale_tm_category1_stat_mn
    22. select
    23. tm_id,
    24. category1_id,
    25. category1_name,
    26. sum(if(order_count>=1,1,0)) buycount,
    27. sum(if(order_count>=2,1,0)) buyTwiceLast,
    28. sum(if(order_count>=2,1,0))/sum( if(order_count>=1,1,0)) buyTwiceLastRatio,
    29. sum(if(order_count>=3,1,0)) buy3timeLast ,
    30. sum(if(order_count>=3,1,0))/sum( if(order_count>=1,1,0)) buy3timeLastRatio ,
    31. date_format('2022-05-20' ,'yyyy-MM') stat_mn,
    32. '2022-05-20' stat_date
    33. from
    34. (
    35. select
    36. tmp_order.user_id,
    37. tmp_sku.category1_id,
    38. tmp_sku.category1_name,
    39. tmp_sku.tm_id,
    40. sum(order_count) order_count
    41. from tmp_order
    42. join tmp_sku
    43. on tmp_order.sku_id=tmp_sku.id
    44. group by tmp_order.user_id,tmp_sku.category1_id,tmp_sku.category1_name,tmp_sku.tm_id
    45. )tmp
    46. group by tm_id, category1_id, category1_name;

    3)查询导入结果

     select * from ads_sale_tm_category1_stat_mn;

    五、地区主题

    5.1地区主题信息

    1)建表语句

    1. create external table ads_area_topic(
    2. `dt` string COMMENT '统计日期',
    3. `id` bigint COMMENT '编号',
    4. `province_name` string COMMENT '省份名称',
    5. `area_code` string COMMENT '地区编码',
    6. `iso_code` string COMMENT 'iso编码',
    7. `region_id` string COMMENT '地区ID',
    8. `region_name` string COMMENT '地区名称',
    9. `login_day_count` bigint COMMENT '当天活跃设备数',
    10. `order_day_count` bigint COMMENT '当天下单次数',
    11. `order_day_amount` decimal(16,2) COMMENT '当天下单金额',
    12. `payment_day_count` bigint COMMENT '当天支付次数',
    13. `payment_day_amount` decimal(16,2) COMMENT '当天支付金额'
    14. ) COMMENT '地区主题信息'
    15. row format delimited fields terminated by '\t'
    16. location '/warehouse/gmall/ads/ads_area_topic/';

    2)导入数据

    1. insert into table ads_area_topic
    2. select
    3. '2022-05-20',
    4. id,
    5. province_name,
    6. area_code,
    7. iso_code,
    8. region_id,
    9. region_name,
    10. login_day_count,
    11. order_day_count,
    12. order_day_amount,
    13. payment_day_count,
    14. payment_day_amount
    15. from dwt_area_topic;

    3)查询导入结果

    select * from ads_area_topic;

    六、ADS层数据导入脚本

    vim dwt_to_ads.sh

    在脚本中填写如下内容:

    1. #!/bin/bash
    2. hive=/training/hive/bin/hive
    3. APP=default
    4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    5. if [ -n "$1" ] ;then
    6. do_date=$1
    7. else
    8. do_date=`date -d "-1 day" +%F`
    9. fi
    10. sql="
    11. set mapreduce.job.queuename=default;
    12. insert into table ${APP}.ads_uv_count
    13. select
    14. '$do_date' dt,
    15. daycount.ct,
    16. wkcount.ct,
    17. mncount.ct,
    18. if(date_add(next_day('$do_date','MO'),-1)='$do_date','Y','N') ,
    19. if(last_day('$do_date')='$do_date','Y','N')
    20. from
    21. (
    22. select
    23. '$do_date' dt,
    24. count(*) ct
    25. from ${APP}.dwt_uv_topic
    26. where login_date_last='$do_date'
    27. )daycount join
    28. (
    29. select
    30. '$do_date' dt,
    31. count (*) ct
    32. from ${APP}.dwt_uv_topic
    33. where login_date_last>=date_add(next_day('$do_date','MO'),-7)
    34. and login_date_last<= date_add(next_day('$do_date','MO'),-1)
    35. ) wkcount on daycount.dt=wkcount.dt
    36. join
    37. (
    38. select
    39. '$do_date' dt,
    40. count (*) ct
    41. from ${APP}.dwt_uv_topic
    42. where date_format(login_date_last,'yyyy-MM')=date_format('$do_date','yyyy-MM')
    43. )mncount on daycount.dt=mncount.dt;
    44. insert into table ${APP}.ads_new_mid_count
    45. select
    46. login_date_first,
    47. count(*)
    48. from ${APP}.dwt_uv_topic
    49. where login_date_first='$do_date'
    50. group by login_date_first;
    51. insert into table ${APP}.ads_silent_count
    52. select
    53. '$do_date',
    54. count(*)
    55. from ${APP}.dwt_uv_topic
    56. where login_date_first=login_date_last
    57. and login_date_last<=date_add('$do_date',-7);
    58. insert into table ${APP}.ads_back_count
    59. select
    60. '$do_date',
    61. concat(date_add(next_day('$do_date','MO'),-7),'_', date_add(next_day('$do_date','MO'),-1)),
    62. count(*)
    63. from
    64. (
    65. select
    66. mid_id
    67. from ${APP}.dwt_uv_topic
    68. where login_date_last>=date_add(next_day('$do_date','MO'),-7)
    69. and login_date_last<= date_add(next_day('$do_date','MO'),-1)
    70. and login_date_first<date_add(next_day('$do_date','MO'),-7)
    71. )current_wk
    72. left join
    73. (
    74. select
    75. mid_id
    76. from ${APP}.dws_uv_detail_daycount
    77. where dt>=date_add(next_day('$do_date','MO'),-7*2)
    78. and dt<= date_add(next_day('$do_date','MO'),-7-1)
    79. group by mid_id
    80. )last_wk
    81. on current_wk.mid_id=last_wk.mid_id
    82. where last_wk.mid_id is null;
    83. insert into table ${APP}.ads_wastage_count
    84. select
    85. '$do_date',
    86. count(*)
    87. from
    88. (
    89. select
    90. mid_id
    91. from ${APP}.dwt_uv_topic
    92. where login_date_last<=date_add('$do_date',-7)
    93. group by mid_id
    94. )t1;
    95. insert into table ${APP}.ads_user_retention_day_rate
    96. select
    97. '$do_date',--统计日期
    98. date_add('$do_date',-1),--新增日期
    99. 1,--留存天数
    100. sum(if(login_date_first=date_add('$do_date',-1) and login_date_last='$do_date',1,0)),--$do_date的1日留存数
    101. sum(if(login_date_first=date_add('$do_date',-1),1,0)),--$do_date新增
    102. sum(if(login_date_first=date_add('$do_date',-1) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',-1),1,0))*100
    103. from ${APP}.dwt_uv_topic
    104. union all
    105. select
    106. '$do_date',--统计日期
    107. date_add('$do_date',-2),--新增日期
    108. 2,--留存天数
    109. sum(if(login_date_first=date_add('$do_date',-2) and login_date_last='$do_date',1,0)),--$do_date的2日留存数
    110. sum(if(login_date_first=date_add('$do_date',-2),1,0)),--$do_date新增
    111. sum(if(login_date_first=date_add('$do_date',-2) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',-2),1,0))*100
    112. from ${APP}.dwt_uv_topic
    113. union all
    114. select
    115. '$do_date',--统计日期
    116. date_add('$do_date',-3),--新增日期
    117. 3,--留存天数
    118. sum(if(login_date_first=date_add('$do_date',-3) and login_date_last='$do_date',1,0)),--$do_date的3日留存数
    119. sum(if(login_date_first=date_add('$do_date',-3),1,0)),--$do_date新增
    120. sum(if(login_date_first=date_add('$do_date',-3) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',-3),1,0))*100
    121. from ${APP}.dwt_uv_topic;
    122. insert into table ${APP}.ads_continuity_wk_count
    123. select
    124. '$do_date',
    125. concat(date_add(next_day('$do_date','MO'),-7*3),'_',date_add(next_day('$do_date','MO'),-1)),
    126. count(*)
    127. from
    128. (
    129. select
    130. mid_id
    131. from
    132. (
    133. select
    134. mid_id
    135. from ${APP}.dws_uv_detail_daycount
    136. where dt>=date_add(next_day('$do_date','monday'),-7)
    137. and dt<=date_add(next_day('$do_date','monday'),-1)
    138. group by mid_id
    139. union all
    140. select
    141. mid_id
    142. from ${APP}.dws_uv_detail_daycount
    143. where dt>=date_add(next_day('$do_date','monday'),-7*2)
    144. and dt<=date_add(next_day('$do_date','monday'),-7-1)
    145. group by mid_id
    146. union all
    147. select
    148. mid_id
    149. from ${APP}.dws_uv_detail_daycount
    150. where dt>=date_add(next_day('$do_date','monday'),-7*3)
    151. and dt<=date_add(next_day('$do_date','monday'),-7*2-1)
    152. group by mid_id
    153. )t1
    154. group by mid_id
    155. having count(*)=3
    156. )t2;
    157. insert into table ${APP}.ads_continuity_uv_count
    158. select
    159. '$do_date',
    160. concat(date_add('$do_date',-6),'_','$do_date'),
    161. count(*)
    162. from
    163. (
    164. select mid_id
    165. from
    166. (
    167. select mid_id
    168. from
    169. (
    170. select
    171. mid_id,
    172. date_sub(dt,rank) date_dif
    173. from
    174. (
    175. select
    176. mid_id,
    177. dt,
    178. rank() over(partition by mid_id order by dt) rank
    179. from ${APP}.dws_uv_detail_daycount
    180. where dt>=date_add('$do_date',-6) and dt<='$do_date'
    181. )t1
    182. )t2
    183. group by mid_id,date_dif
    184. having count(*)>=3
    185. )t3
    186. group by mid_id
    187. )t4;
    188. insert into table ${APP}.ads_user_topic
    189. select
    190. '$do_date',
    191. sum(if(login_date_last='$do_date',1,0)),
    192. sum(if(login_date_first='$do_date',1,0)),
    193. sum(if(payment_date_first='$do_date',1,0)),
    194. sum(if(payment_count>0,1,0)),
    195. count(*),
    196. sum(if(login_date_last='$do_date',1,0))/count(*),
    197. sum(if(payment_count>0,1,0))/count(*),
    198. sum(if(login_date_first='$do_date',1,0))/sum(if(login_date_last='$do_date',1,0))
    199. from ${APP}.dwt_user_topic;
    200. with
    201. tmp_uv as
    202. (
    203. select
    204. '$do_date' dt,
    205. sum(if(array_contains(pages,'home'),1,0)) home_count,
    206. sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    207. from
    208. (
    209. select
    210. mid_id,
    211. collect_set(page_id) pages
    212. from ${APP}.dwd_page_log
    213. where dt='$do_date'
    214. and page_id in ('home','good_detail')
    215. group by mid_id
    216. )tmp
    217. ),
    218. tmp_cop as
    219. (
    220. select
    221. '$do_date' dt,
    222. sum(if(cart_count>0,1,0)) cart_count,
    223. sum(if(order_count>0,1,0)) order_count,
    224. sum(if(payment_count>0,1,0)) payment_count
    225. from ${APP}.dws_user_action_daycount
    226. where dt='$do_date'
    227. )
    228. insert into table ${APP}.ads_user_action_convert_day
    229. select
    230. tmp_uv.dt,
    231. tmp_uv.home_count,
    232. tmp_uv.good_detail_count,
    233. tmp_uv.good_detail_count/tmp_uv.home_count*100,
    234. tmp_cop.cart_count,
    235. tmp_cop.cart_count/tmp_uv.good_detail_count*100,
    236. tmp_cop.order_count,
    237. tmp_cop.order_count/tmp_cop.cart_count*100,
    238. tmp_cop.payment_count,
    239. tmp_cop.payment_count/tmp_cop.order_count*100
    240. from tmp_uv
    241. join tmp_cop
    242. on tmp_uv.dt=tmp_cop.dt;
    243. insert into table ${APP}.ads_product_info
    244. select
    245. '$do_date' dt,
    246. sku_num,
    247. spu_num
    248. from
    249. (
    250. select
    251. '$do_date' dt,
    252. count(*) sku_num
    253. from
    254. ${APP}.dwt_sku_topic
    255. ) tmp_sku_num
    256. join
    257. (
    258. select
    259. '$do_date' dt,
    260. count(*) spu_num
    261. from
    262. (
    263. select
    264. spu_id
    265. from
    266. ${APP}.dwt_sku_topic
    267. group by
    268. spu_id
    269. ) tmp_spu_id
    270. ) tmp_spu_num
    271. on
    272. tmp_sku_num.dt=tmp_spu_num.dt;
    273. insert into table ${APP}.ads_product_sale_topN
    274. select
    275. '$do_date' dt,
    276. sku_id,
    277. payment_amount
    278. from
    279. ${APP}.dws_sku_action_daycount
    280. where
    281. dt='$do_date'
    282. order by payment_amount desc
    283. limit 10;
    284. insert into table ${APP}.ads_product_favor_topN
    285. select
    286. '$do_date' dt,
    287. sku_id,
    288. favor_count
    289. from
    290. ${APP}.dws_sku_action_daycount
    291. where
    292. dt='$do_date'
    293. order by favor_count desc
    294. limit 10;
    295. insert into table ${APP}.ads_product_cart_topN
    296. select
    297. '$do_date' dt,
    298. sku_id,
    299. cart_count
    300. from
    301. ${APP}.dws_sku_action_daycount
    302. where
    303. dt='$do_date'
    304. order by cart_count desc
    305. limit 10;
    306. insert into table ${APP}.ads_product_refund_topN
    307. select
    308. '$do_date',
    309. sku_id,
    310. refund_last_30d_count/payment_last_30d_count*100 refund_ratio
    311. from ${APP}.dwt_sku_topic
    312. order by refund_ratio desc
    313. limit 10;
    314. insert into table ${APP}.ads_appraise_bad_topN
    315. select
    316. '$do_date' dt,
    317. sku_id,
    318. appraise_bad_count/(appraise_good_count+appraise_mid_count+appraise_bad_count+appraise_default_count) appraise_bad_ratio
    319. from
    320. ${APP}.dws_sku_action_daycount
    321. where
    322. dt='$do_date'
    323. order by appraise_bad_ratio desc
    324. limit 10;
    325. insert into table ${APP}.ads_order_daycount
    326. select
    327. '$do_date',
    328. sum(order_count),
    329. sum(order_amount),
    330. sum(if(order_count>0,1,0))
    331. from ${APP}.dws_user_action_daycount
    332. where dt='$do_date';
    333. insert into table ${APP}.ads_payment_daycount
    334. select
    335. tmp_payment.dt,
    336. tmp_payment.payment_count,
    337. tmp_payment.payment_amount,
    338. tmp_payment.payment_user_count,
    339. tmp_skucount.payment_sku_count,
    340. tmp_time.payment_avg_time
    341. from
    342. (
    343. select
    344. '$do_date' dt,
    345. sum(payment_count) payment_count,
    346. sum(payment_amount) payment_amount,
    347. sum(if(payment_count>0,1,0)) payment_user_count
    348. from ${APP}.dws_user_action_daycount
    349. where dt='$do_date'
    350. )tmp_payment
    351. join
    352. (
    353. select
    354. '$do_date' dt,
    355. sum(if(payment_count>0,1,0)) payment_sku_count
    356. from ${APP}.dws_sku_action_daycount
    357. where dt='$do_date'
    358. )tmp_skucount on tmp_payment.dt=tmp_skucount.dt
    359. join
    360. (
    361. select
    362. '$do_date' dt,
    363. sum(unix_timestamp(payment_time)-unix_timestamp(create_time))/count(*)/60 payment_avg_time
    364. from ${APP}.dwd_fact_order_info
    365. where dt='$do_date'
    366. and payment_time is not null
    367. )tmp_time on tmp_payment.dt=tmp_time.dt;
    368. with
    369. tmp_order as
    370. (
    371. select
    372. user_id,
    373. order_stats_struct.sku_id sku_id,
    374. order_stats_struct.order_count order_count
    375. from ${APP}.dws_user_action_daycount lateral view explode(order_detail_stats) tmp as order_stats_struct
    376. where date_format(dt,'yyyy-MM')=date_format('$do_date','yyyy-MM')
    377. ),
    378. tmp_sku as
    379. (
    380. select
    381. id,
    382. tm_id,
    383. category1_id,
    384. category1_name
    385. from ${APP}.dwd_dim_sku_info
    386. where dt='$do_date'
    387. )
    388. insert into table ${APP}.ads_sale_tm_category1_stat_mn
    389. select
    390. tm_id,
    391. category1_id,
    392. category1_name,
    393. sum(if(order_count>=1,1,0)) buycount,
    394. sum(if(order_count>=2,1,0)) buyTwiceLast,
    395. sum(if(order_count>=2,1,0))/sum( if(order_count>=1,1,0)) buyTwiceLastRatio,
    396. sum(if(order_count>=3,1,0)) buy3timeLast ,
    397. sum(if(order_count>=3,1,0))/sum( if(order_count>=1,1,0)) buy3timeLastRatio ,
    398. date_format('$do_date' ,'yyyy-MM') stat_mn,
    399. '$do_date' stat_date
    400. from
    401. (
    402. select
    403. tmp_order.user_id,
    404. tmp_sku.category1_id,
    405. tmp_sku.category1_name,
    406. tmp_sku.tm_id,
    407. sum(order_count) order_count
    408. from tmp_order
    409. join tmp_sku
    410. on tmp_order.sku_id=tmp_sku.id
    411. group by tmp_order.user_id,tmp_sku.category1_id,tmp_sku.category1_name,tmp_sku.tm_id
    412. )tmp
    413. group by tm_id, category1_id, category1_name;
    414. insert into table ${APP}.ads_area_topic
    415. select
    416. '$do_date',
    417. id,
    418. province_name,
    419. area_code,
    420. iso_code,
    421. region_id,
    422. region_name,
    423. login_day_count,
    424. order_day_count,
    425. order_day_amount,
    426. payment_day_count,
    427. payment_day_amount
    428. from ${APP}.dwt_area_topic;
    429. "
    430. $hive -e "$sql"

    2)增加脚本执行权限 chmod 777 dwt_to_ads.sh

    3)执行脚本导入数据 dwt_to_ads.sh 2022-05-21

    4)查看导入数据

    1. select * from ads_uv_count;
    2. select * from ads_new_mid_count;
    3. select * from ads_silent_count;
    4. select * from ads_back_count';
    5. select * from ads_wastage_count;
    6. select * from ads_user_retention_day_rate;
    7. select * from ads_continuity_wk_count';
    8. select * from ads_continuity_uv_count;
    9. select * from ads_user_topic where dt='2020-06-15';
    10. select * from ads_user_action_convert_day;
    11. select * from ads_product_info;
    12. select * from ads_product_sale_topN;
    13. select * from ads_product_favor_topN;
    14. select * from ads_product_cart_topN;
    15. select * from ads_product_refund_topN;
    16. select * from ads_appraise_bad_topN;
    17. select * from ads_order_daycount;
    18. select * from ads_payment_daycount;
    19. select * from ads_sale_tm_category1_stat_mn;
    20. select * from ads_area_topic';

    完成!!!

    作者水平低,如有错误,恳请指正!谢谢!!!!!

    本篇文章参考尚硅谷大数据项目写成!

  • 相关阅读:
    贪心算法学习——最大数
    SHELL (bash)编程记录
    【SpringBoot】几种常见的数据脱敏方案
    Java设计模式之适配器模式
    spdlog记录日志示例 - 使用sink创建logger
    Android和Linux的开发差异
    Springboot之SpringMVC与MyBatis(二)异步迭代商品管理
    [附源码]计算机毕业设计springboot求职招聘网站
    Django 所带的用户auth_user的坑点,authenticate()校验一直为None,校验与创建所遇到的问题整理与解决
    知识蒸馏IRG算法实战:使用ResNet50蒸馏ResNet18
  • 原文地址:https://blog.csdn.net/qq_55906442/article/details/125008169