• 数据仓库dws层,md,review第1遍,220622,


    DWS

    • 名称:数据服务层 service

    • 功能:按主题划分,形成日统计的宽表,轻度汇总提前聚合操作。

    • 解释:轻度提前聚合说的是先聚合出日的指标,后续可以上卷出周、月、年的指标。

    dws这里,主题终于出现了~~~

    一个主题所需要的指标、维度,可能往往需要多个DWB层的宽表共同参与计算才能得出。甚至还需要之前层如dwd等参与计算。

    轻度、重度 :描述干活的多少  如果都做 叫做重度  只做某一部分叫做轻度
    细粒度、粗粒度:越往下粒度越细
        举个栗子:1、计算每年的销售额  2、计算每天的销售额
        上面这个两个都是根据时间维度计算的 哪个粒度细? 天的粒度细  年粒度粗

    时间维度:年 月 日 时 分 秒

    使用DataGrip在Hive中创建dws层

    注意,对于建库建表操作,需直接使用Hive,因为Presto只是一个数据分析的引擎,其语法不一定支持直接在Hive中建库建表。

    销售收入、平台收入、配送成交额、

    小程序成交额、安卓APP成交额、苹果APP成交额、PC商城成交额、

    订单量、参评单量、差评单量、

    配送单量、退款单量、

    小程序订单量、安卓APP订单量、苹果APP订单量、PC商城订单量

    --共计: 16个指标

    日期、城市、商圈、店铺、品牌、商品大类、商品中类、商品小类

    --共计:  8个维度
    --cube所有组合:  2^8=256个

    注意,其中日期这个维度很特殊,特殊在我们的表就是根据日期分区的,分区的字段是day天。
    而dws这一层我们需要统计的也是按day统计,日统计宽表嘛
    这也就意味着一个分区就是一天。

    知识点10:DWS层搭建--销售主题宽表--step1--字段抽取

     

    表关系

    一切的前提是,先了解原始数据的结构和关系。

    对于销售主题宽表来说,其当中的指标和维度字段分别来源于DWB层:订单明细宽表店铺明细宽表商品明细宽表

    比如商圈、店铺等维度来自于店铺明细宽表;大中小分类来自于商品明细宽表;而成交额等指标需要依赖订单明细宽表。

     --以订单为准,以goods_id关联商品,以store_id关联店铺
    select *
    from dwb_order_detail o
        left join dwb_goods_detail g on o.goods_id = g.id
        left join dwb_shop_detail s on o.store_id = s.id;

    字段抽取

    关联之后,字段非常多,但是并不意味着每一个字段都是销售主题宽表统计需要的;

    因此需要根据销售主题宽表的计算指标和维度,把相关的字段抽取出来

    知识点11:DWS层搭建--销售主题宽表--step2--row_number去重(可选)

    使用row_number分组去重的时候需要注意:

    1、对于城市、商圈、店铺等维度的成交额计算,根据订单order_amount汇总求和即可;

    2、而对于品牌、大类、中类、小类等维度成交额计算,需要根据goods_id计算。

    --上述表的

    数据中,如果计算不同品牌的成交额,就不能再根据订单金额相加了
    --而是必须根据每个订单中,这个品牌的金额进行计算
    --因为订单中可以有不同的商品品牌。

    row_number() over(partition by order_id) as order_rn,
    row_number() over(partition by order_id,g.brand_id) as brand_rn,
    row_number() over(partition by order_id,g.max_class_name) as maxclass_rn,
    row_number() over(partition by order_id,g.max_class_name,g.mid_class_name) as midclass_rn,
    row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,g.min_class_name) as minclass_rn,

    --下面分组加入goods_id
    row_number() over(partition by order_id,g.brand_id,o.goods_id) as brand_goods_rn,
    row_number() over(partition by order_id,g.max_class_name,o.goods_id) as maxclass_goods_rn,
    row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,o.goods_id) as midclass_goods_rn,
    row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,g.min_class_name,o.goods_id) as minclass_goods_rn

    知识点12:DWS层搭建--销售主题宽表--step3--grouping sets分组

    • 使用CTE表达式针对上面抽取字段、分组去重的结果进行引导

      with  temp as (
          select 抽取字段、row_number去重)   

      注意,到这一步为止,temp表的数据已经和之前的我们创建的简易模型、复杂模型差不多了。

      后面的技术主要就是case when+grouoing判断。

    • 根据业务需求进行维度组合,使用grouping sets进行分组。

    日期
    日期+城市
    日期+城市+商圈
    日期+城市+商圈+店铺
    日期+品牌
    日期+大类
    日期+大类+中类
    日期+大类+中类+小类

    with  temp as (
        select 抽取字段、row_number去重)

    select
        xxxxx
    from temp
    group by
        grouping sets(
            create_date, --日期
            (create_date,city_id,city_name),--日期+城市
            (create_date,city_id,city_name,trade_area_id,trade_area_name),--日期+城市+商圈
            (create_date,city_id,city_name,trade_area_id,trade_area_name,store_id,store_name), --日期+城市+商圈+店铺
            (create_date,brand_id,brand_name),--日期+品牌
            (create_date,max_class_id,max_class_name),--日期+大类
            (create_date,max_class_id,max_class_name,mid_class_id,mid_class_name),--日期+大类+中类
            (create_date,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)--日期+大类+中类+小类
        );

    知识点13:DWS层搭建--销售主题宽表--step4--维度字段判断

    提示:可以根据待插入的目标表yp_dws.dws_sale_daycount的字段顺序,把结果返回。

    1. case when grouping(city_id) = 0 --如果分组中包含city_id 则grouping为0 那么就返回city_id
    2. then city_id
    3. else null end as city_id ,
    4. case when grouping(city_id) = 0
    5. then city_name
    6. else null end as city_name ,
    7. case when grouping(trade_area_id) = 0--商圈
    8. then trade_area_id
    9. else null end as trade_area_id ,
    10. case when grouping(trade_area_id) = 0
    11. then trade_area_name
    12. else null end as trade_area_name ,
    13. case when grouping(store_id) = 0 --店铺
    14. then store_id
    15. else null end as store_id ,
    16. case when grouping(store_id) = 0
    17. then store_name
    18. else null end as store_name ,
    19. case when grouping(brand_id) = 0 --品牌
    20. then brand_id
    21. else null end as brand_id ,
    22. case when grouping(brand_id) = 0
    23. then brand_name
    24. else null end as brand_name ,
    25. case when grouping(max_class_id) = 0 --大类
    26. then max_class_id
    27. else null end as max_class_id ,
    28. case when grouping(max_class_id) = 0
    29. then max_class_name
    30. else null end as max_class_name ,
    31. case when grouping(mid_class_id) = 0 --中类
    32. then mid_class_id
    33. else null end as mid_class_id ,
    34. case when grouping(mid_class_id) = 0
    35. then mid_class_name
    36. else null end as mid_class_name ,
    37. case when grouping(min_class_id) = 0--小类
    38. then min_class_id
    39. else null end as min_class_id ,
    40. case when grouping(min_class_id) = 0
    41. then min_class_name
    42. else null end as min_class_name ,
    43. case when grouping(store_id,store_name) = 0 --分组类型
    44. then 'store'
    45. when grouping(trade_area_id ,trade_area_name) = 0
    46. then 'trade_area'
    47. when grouping (city_id,city_name) = 0
    48. then 'city'
    49. when grouping (brand_id,brand_name) = 0
    50. then 'brand'
    51. when grouping (min_class_id,min_class_name) = 0
    52. then 'min_class'
    53. when grouping (mid_class_id,mid_class_name) = 0
    54. then 'mid_class'
    55. when grouping (max_class_id,max_class_name) = 0
    56. then 'max_class'
    57. when grouping (create_date) = 0
    58. --注意:在使用grouping sets的时候 为了可以快速便捷的区分每条数据是根据谁进行的分组计算,
    59. --可以有意识的在表中添加类似分组ID 或者分组type这样的字段
    60. --使用 1 2 3 4 或者具体的字段类标识 数据属于哪一个分组的
    61. --如何精准识别啊?
    62. 笔记上面的分组判断就不严谨。
    63. --如何严谨 精准的识别8个分组
    64. (dt),
    65. (dt,city_id,city_name),
    66. (dt,city_id,city_name,area_id,area_name),
    67. (dt,city_id,city_name,area_id,area_name,store_id,store_name),
    68. (dt,brand_id,brand_name),
    69. (dt,max_class_id,max_class_name),
    70. (dt,max_class_id,max_class_name,mid_class_id,mid_class_name)
    71. (dt,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)
    72. --
    73. grouping(dt,city_id ,area_id,store_id,brand_id,max_class_id,mid_class_id,min_class_id)
    74. --十进制转二进制
    75. https://tool.lu/hexconvert/

     

    知识点14:DWS层搭建--销售主题宽表--step5--销售收入统计

    1. --指标计算 注意每个指标都对应着8个分组维度的计算
    2. --1、销售收入指标 sale_amt
    3. case when grouping(store_id,store_name) =0 --如果分组中包含店铺,则分组为:日期+城市+商圈+店铺
    4. then sum(if( order_rn = 1 and store_id is not null ,order_amount,0)) --只有分组中标号为1的(去重),店铺不为空的才参与计算
    5. --then sum(if( order_rn = 1 and store_id is not null ,coalesce(order_amount,0),0))
    6. --使用coalesce函数更加成熟
    7. --第一个问题:如果业务数据bug 导致null出现 后续sum求和会不会出错?
    8. sum(80+20+80+null)
    9. sum(80+20+80+0)
    10. when grouping (trade_area_id ,trade_area_name) = 0 --日期+城市+商圈
    11. then sum(if( order_rn = 1 and trade_area_id is not null ,order_amount,0))
    12. when grouping (city_id,city_name) = 0 --日期+城市
    13. then sum(if( order_rn = 1 and city_id is not null,order_amount,0))
    14. when grouping (brand_id,brand_name) = 0 --日期+品牌
    15. then sum(if(brand_goods_rn = 1 and brand_id is not null,total_price,0))
    16. when grouping (min_class_id,min_class_name) = 0 --日期+大类+中类+小类
    17. then sum(if(minclass_goods_rn = 1 and min_class_id is not null ,total_price,0))
    18. when grouping (mid_class_id,mid_class_name) = 0 --日期+大类+中类
    19. then sum(if(midclass_goods_rn = 1 and mid_class_id is not null,total_price,0))
    20. when grouping (max_class_id,max_class_name) = 0 ----日期+大类
    21. then sum(if(maxclass_goods_rn = 1 and max_class_id is not null ,total_price,0))
    22. when grouping (create_date) = 0 --日期
    23. then sum(if(order_rn=1 and create_date is not null,order_amount,0))
    24. else null end as sale_amt,
    25. --提示
    26. 计算 日期 城市 商圈 店铺 使用orderr_maount
    27. 计算 品牌 大类 中类 小类 使用total_price
    28. --为什么 详细见课堂画图。
    29. --对于重复的数据 之前使用row_number标记之后 在sum求和之前 使用if进行判断 为1的保留

    知识点15:DWS层搭建--销售主题宽表--step6--金额指标统计

     

    1. --2、平台收入 plat_amt
    2. case when grouping(store_id,store_name) =0
    3. then sum(if( order_rn = 1 and store_id is not null ,plat_fee,0))
    4. when grouping (trade_area_id ,trade_area_name) = 0
    5. then sum(if( order_rn = 1 and trade_area_id is not null ,plat_fee,0))
    6. when grouping (city_id,city_name) = 0
    7. then sum(if( order_rn = 1 and city_id is not null,plat_fee,0))
    8. when grouping (brand_id,brand_name) = 0
    9. then null
    10. when grouping (min_class_id,min_class_name) = 0
    11. then null
    12. when grouping (mid_class_id,mid_class_name) = 0
    13. then null
    14. when grouping (max_class_id,max_class_name) = 0
    15. then null
    16. when grouping (create_date) = 0
    17. then sum(if(order_rn=1 and create_date is not null,plat_fee,0))
    18. else null end as plat_amt ,
    19. -- 3、配送成交额 deliver_sale_amt
    20. case when grouping(store_id,store_name) =0
    21. then sum(if( order_rn = 1 and store_id is not null and delievery_id is not null ,dispatcher_money,0))
    22. when grouping (trade_area_id ,trade_area_name) = 0
    23. then sum(if( order_rn = 1 and trade_area_id is not null and delievery_id is not null,dispatcher_money,0))
    24. when grouping (city_id,city_name) = 0
    25. then sum(if( order_rn = 1 and city_id is not null and delievery_id is not null,dispatcher_money,0))
    26. when grouping (brand_id,brand_name) = 0
    27. then null
    28. when grouping (min_class_id,min_class_name) = 0
    29. then null
    30. when grouping (mid_class_id,mid_class_name) = 0
    31. then null
    32. when grouping (max_class_id,max_class_name) = 0
    33. then null
    34. when grouping (create_date) = 0
    35. then sum(if(order_rn=1 and create_date is not null and delievery_id is not null ,dispatcher_money,0))
    36. else null end as deliver_sale_amt ,
    37. -- 4、小程序成交额 mini_app_sale_amt
    38. case when grouping(store_id,store_name) =0
    39. then sum(if( order_rn = 1 and store_id is not null and order_from='miniapp' ,order_amount,0))
    40. when grouping (trade_area_id ,trade_area_name) = 0
    41. then sum(if( order_rn = 1 and trade_area_id is not null and order_from='miniapp',order_amount,0))
    42. when grouping (city_id,city_name) = 0
    43. then sum(if( order_rn = 1 and city_id is not null and order_from='miniapp',order_amount,0))
    44. when grouping (brand_id,brand_name) = 0
    45. then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='miniapp',total_price,0))
    46. when grouping (min_class_id,min_class_name) = 0
    47. then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='miniapp',total_price,0))
    48. when grouping (mid_class_id,mid_class_name) = 0
    49. then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='miniapp',total_price,0))
    50. when grouping (max_class_id,max_class_name) = 0
    51. then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='miniapp',total_price,0))
    52. when grouping (create_date) = 0
    53. then sum(if(order_rn=1 and create_date is not null and order_from='miniapp',order_amount ,0))
    54. else null end as mini_app_sale_amt ,
    55. -- 5、安卓成交额 android_sale_amt
    56. case when grouping(store_id,store_name) =0
    57. then sum(if( order_rn = 1 and store_id is not null and order_from='android' ,order_amount,0))
    58. when grouping (trade_area_id ,trade_area_name) = 0
    59. then sum(if( order_rn = 1 and trade_area_id is not null and order_from='android',order_amount,0))
    60. when grouping (city_id,city_name) = 0
    61. then sum(if( order_rn = 1 and city_id is not null and order_from='android',order_amount,0))
    62. when grouping (brand_id,brand_name) = 0
    63. then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='android',total_price,0))
    64. when grouping (min_class_id,min_class_name) = 0
    65. then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='android',total_price,0))
    66. when grouping (mid_class_id,mid_class_name) = 0
    67. then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='android',total_price,0))
    68. when grouping (max_class_id,max_class_name) = 0
    69. then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='android',total_price,0))
    70. when grouping (create_date) = 0
    71. then sum(if(order_rn=1 and create_date is not null and order_from='android',order_amount ,0))
    72. else null end as android_sale_amt ,
    73. -- 6、苹果成交额 ios_sale_amt
    74. case when grouping(store_id,store_name) =0
    75. then sum(if( order_rn = 1 and store_id is not null and order_from='ios' ,order_amount,0))
    76. when grouping (trade_area_id ,trade_area_name) = 0
    77. then sum(if( order_rn = 1 and trade_area_id is not null and order_from='ios',order_amount,0))
    78. when grouping (city_id,city_name) = 0
    79. then sum(if( order_rn = 1 and city_id is not null and order_from='ios',order_amount,0))
    80. when grouping (brand_id,brand_name) = 0
    81. then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='ios',total_price,0))
    82. when grouping (min_class_id,min_class_name) = 0
    83. then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='ios',total_price,0))
    84. when grouping (mid_class_id,mid_class_name) = 0
    85. then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='ios',total_price,0))
    86. when grouping (max_class_id,max_class_name) = 0
    87. then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='ios',total_price,0))
    88. when grouping (create_date) = 0
    89. then sum(if(order_rn=1 and create_date is not null and order_from='ios',order_amount ,0))
    90. else null end as ios_sale_amt ,
    91. -- 7、pc成交额 pcweb_sale_amt
    92. case when grouping(store_id,store_name) =0
    93. then sum(if( order_rn = 1 and store_id is not null and order_from='pcweb' ,order_amount,0))
    94. when grouping (trade_area_id ,trade_area_name) = 0
    95. then sum(if( order_rn = 1 and trade_area_id is not null and order_from='pcweb',order_amount,0))
    96. when grouping (city_id,city_name) = 0
    97. then sum(if( order_rn = 1 and city_id is not null and order_from='pcweb',order_amount,0))
    98. when grouping (brand_id,brand_name) = 0
    99. then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='pcweb',total_price,0))
    100. when grouping (min_class_id,min_class_name) = 0
    101. then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='pcweb',total_price,0))
    102. when grouping (mid_class_id,mid_class_name) = 0
    103. then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='pcweb',total_price,0))
    104. when grouping (max_class_id,max_class_name) = 0
    105. then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='pcweb',total_price,0))
    106. when grouping (create_date) = 0
    107. then sum(if(order_rn=1 and create_date is not null and order_from='pcweb',order_amount ,0))
    108. else null end as pcweb_sale_amt ,

            

  • 相关阅读:
    Jenkins中强制停止停不下来的job
    2022杭电多校第八场题解
    算法通关村-----图的基本算法
    Chapter 1 - 8. Introduction to Congestion in Storage Networks
    [Python学习篇] Python字典
    Java——聊聊JUC中的锁(synchronized & Lock & ReentrantLock)
    计算机网络4小时速成:数据链路层,功能,封装成帧,透明传输,差错控制,PPP协议,广播信道,以太网,MAC层,设备
    Spring Security 集成 OAuth 2.0 认证(一)
    mysql忘记密码修改
    Win10安装DBeaver连接MySQL8、导入和导出数据库详细教程
  • 原文地址:https://blog.csdn.net/m0_48941160/article/details/125416669