DWS
名称:数据服务层 service
功能:按主题划分,形成日统计的宽表,轻度汇总提前聚合操作。
解释:轻度提前聚合说的是先聚合出日的指标,后续可以上卷出周、月、年的指标。
dws这里,主题终于出现了~~~
一个主题所需要的指标、维度,可能往往需要多个DWB层的宽表共同参与计算才能得出。甚至还需要之前层如dwd等参与计算。
轻度、重度 :描述干活的多少 如果都做 叫做重度 只做某一部分叫做轻度
细粒度、粗粒度:越往下粒度越细
举个栗子:1、计算每年的销售额 2、计算每天的销售额
上面这个两个都是根据时间维度计算的 哪个粒度细? 天的粒度细 年粒度粗
时间维度:年 月 日 时 分 秒
使用DataGrip在Hive中创建dws层
注意,对于建库建表操作,需直接使用Hive,因为Presto只是一个数据分析的引擎,其语法不一定支持直接在Hive中建库建表。
销售收入、平台收入、配送成交额、
小程序成交额、安卓APP成交额、苹果APP成交额、PC商城成交额、
订单量、参评单量、差评单量、
配送单量、退款单量、
小程序订单量、安卓APP订单量、苹果APP订单量、PC商城订单量
--共计: 16个指标
日期、城市、商圈、店铺、品牌、商品大类、商品中类、商品小类
--共计: 8个维度
--cube所有组合: 2^8=256个
注意,其中日期这个维度很特殊,特殊在我们的表就是根据日期分区的,分区的字段是day天。
而dws这一层我们需要统计的也是按day统计,日统计宽表嘛
这也就意味着一个分区就是一天。
知识点10:DWS层搭建--销售主题宽表--step1--字段抽取
表关系
一切的前提是,先了解原始数据的结构和关系。
对于销售主题宽表来说,其当中的指标和维度字段分别来源于DWB层:订单明细宽表、店铺明细宽表、商品明细宽表。
比如商圈、店铺等维度来自于店铺明细宽表;大中小分类来自于商品明细宽表;而成交额等指标需要依赖订单明细宽表。
--以订单为准,以goods_id关联商品,以store_id关联店铺
select *
from dwb_order_detail o
left join dwb_goods_detail g on o.goods_id = g.id
left join dwb_shop_detail s on o.store_id = s.id;
字段抽取
关联之后,字段非常多,但是并不意味着每一个字段都是销售主题宽表统计需要的;
因此需要根据销售主题宽表的计算指标和维度,把相关的字段抽取出来
知识点11:DWS层搭建--销售主题宽表--step2--row_number去重(可选)
使用row_number分组去重的时候需要注意:
1、对于城市、商圈、店铺等维度的成交额计算,根据订单order_amount汇总求和即可;
2、而对于品牌、大类、中类、小类等维度成交额计算,需要根据goods_id计算。
--上述表的
数据中,如果计算不同品牌的成交额,就不能再根据订单金额相加了
--而是必须根据每个订单中,这个品牌的金额进行计算
--因为订单中可以有不同的商品品牌。
row_number() over(partition by order_id) as order_rn,
row_number() over(partition by order_id,g.brand_id) as brand_rn,
row_number() over(partition by order_id,g.max_class_name) as maxclass_rn,
row_number() over(partition by order_id,g.max_class_name,g.mid_class_name) as midclass_rn,
row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,g.min_class_name) as minclass_rn,
--下面分组加入goods_id
row_number() over(partition by order_id,g.brand_id,o.goods_id) as brand_goods_rn,
row_number() over(partition by order_id,g.max_class_name,o.goods_id) as maxclass_goods_rn,
row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,o.goods_id) as midclass_goods_rn,
row_number() over(partition by order_id,g.max_class_name,g.mid_class_name,g.min_class_name,o.goods_id) as minclass_goods_rn
知识点12:DWS层搭建--销售主题宽表--step3--grouping sets分组
使用CTE表达式针对上面抽取字段、分组去重的结果进行引导
with temp as ( select 抽取字段、row_number去重)
注意,到这一步为止,temp表的数据已经和之前的我们创建的简易模型、复杂模型差不多了。
后面的技术主要就是case when+grouoing判断。
根据业务需求进行维度组合,使用grouping sets进行分组。
日期
日期+城市
日期+城市+商圈
日期+城市+商圈+店铺
日期+品牌
日期+大类
日期+大类+中类
日期+大类+中类+小类
with temp as (
select 抽取字段、row_number去重)
select
xxxxx
from temp
group by
grouping sets(
create_date, --日期
(create_date,city_id,city_name),--日期+城市
(create_date,city_id,city_name,trade_area_id,trade_area_name),--日期+城市+商圈
(create_date,city_id,city_name,trade_area_id,trade_area_name,store_id,store_name), --日期+城市+商圈+店铺
(create_date,brand_id,brand_name),--日期+品牌
(create_date,max_class_id,max_class_name),--日期+大类
(create_date,max_class_id,max_class_name,mid_class_id,mid_class_name),--日期+大类+中类
(create_date,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)--日期+大类+中类+小类
);
知识点13:DWS层搭建--销售主题宽表--step4--维度字段判断
提示:可以根据待插入的目标表yp_dws.dws_sale_daycount的字段顺序,把结果返回。
- case when grouping(city_id) = 0 --如果分组中包含city_id 则grouping为0 那么就返回city_id
- then city_id
- else null end as city_id ,
- case when grouping(city_id) = 0
- then city_name
- else null end as city_name ,
- case when grouping(trade_area_id) = 0--商圈
- then trade_area_id
- else null end as trade_area_id ,
- case when grouping(trade_area_id) = 0
- then trade_area_name
- else null end as trade_area_name ,
- case when grouping(store_id) = 0 --店铺
- then store_id
- else null end as store_id ,
- case when grouping(store_id) = 0
- then store_name
- else null end as store_name ,
- case when grouping(brand_id) = 0 --品牌
- then brand_id
- else null end as brand_id ,
- case when grouping(brand_id) = 0
- then brand_name
- else null end as brand_name ,
- case when grouping(max_class_id) = 0 --大类
- then max_class_id
- else null end as max_class_id ,
- case when grouping(max_class_id) = 0
- then max_class_name
- else null end as max_class_name ,
- case when grouping(mid_class_id) = 0 --中类
- then mid_class_id
- else null end as mid_class_id ,
- case when grouping(mid_class_id) = 0
- then mid_class_name
- else null end as mid_class_name ,
- case when grouping(min_class_id) = 0--小类
- then min_class_id
- else null end as min_class_id ,
- case when grouping(min_class_id) = 0
- then min_class_name
- else null end as min_class_name ,
-
- case when grouping(store_id,store_name) = 0 --分组类型
- then 'store'
- when grouping(trade_area_id ,trade_area_name) = 0
- then 'trade_area'
- when grouping (city_id,city_name) = 0
- then 'city'
- when grouping (brand_id,brand_name) = 0
- then 'brand'
- when grouping (min_class_id,min_class_name) = 0
- then 'min_class'
- when grouping (mid_class_id,mid_class_name) = 0
- then 'mid_class'
- when grouping (max_class_id,max_class_name) = 0
- then 'max_class'
- when grouping (create_date) = 0
-
- --注意:在使用grouping sets的时候 为了可以快速便捷的区分每条数据是根据谁进行的分组计算,
- --可以有意识的在表中添加类似分组ID 或者分组type这样的字段
- --使用 1 2 3 4 或者具体的字段类标识 数据属于哪一个分组的
-
- --如何精准识别啊?
- 笔记上面的分组判断就不严谨。
-
- --如何严谨 精准的识别8个分组
- (dt),
- (dt,city_id,city_name),
- (dt,city_id,city_name,area_id,area_name),
- (dt,city_id,city_name,area_id,area_name,store_id,store_name),
- (dt,brand_id,brand_name),
- (dt,max_class_id,max_class_name),
- (dt,max_class_id,max_class_name,mid_class_id,mid_class_name)
- (dt,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)
-
-
- --
- grouping(dt,city_id ,area_id,store_id,brand_id,max_class_id,mid_class_id,min_class_id)
-
- --十进制转二进制
- https://tool.lu/hexconvert/
知识点14:DWS层搭建--销售主题宽表--step5--销售收入统计
- --指标计算 注意每个指标都对应着8个分组维度的计算
- --1、销售收入指标 sale_amt
- case when grouping(store_id,store_name) =0 --如果分组中包含店铺,则分组为:日期+城市+商圈+店铺
- then sum(if( order_rn = 1 and store_id is not null ,order_amount,0)) --只有分组中标号为1的(去重),店铺不为空的才参与计算
- --then sum(if( order_rn = 1 and store_id is not null ,coalesce(order_amount,0),0))
- --使用coalesce函数更加成熟
-
- --第一个问题:如果业务数据bug 导致null出现 后续sum求和会不会出错?
- sum(80+20+80+null)
- sum(80+20+80+0)
-
- when grouping (trade_area_id ,trade_area_name) = 0 --日期+城市+商圈
- then sum(if( order_rn = 1 and trade_area_id is not null ,order_amount,0))
-
- when grouping (city_id,city_name) = 0 --日期+城市
- then sum(if( order_rn = 1 and city_id is not null,order_amount,0))
-
- when grouping (brand_id,brand_name) = 0 --日期+品牌
- then sum(if(brand_goods_rn = 1 and brand_id is not null,total_price,0))
-
- when grouping (min_class_id,min_class_name) = 0 --日期+大类+中类+小类
- then sum(if(minclass_goods_rn = 1 and min_class_id is not null ,total_price,0))
-
- when grouping (mid_class_id,mid_class_name) = 0 --日期+大类+中类
- then sum(if(midclass_goods_rn = 1 and mid_class_id is not null,total_price,0))
-
- when grouping (max_class_id,max_class_name) = 0 ----日期+大类
- then sum(if(maxclass_goods_rn = 1 and max_class_id is not null ,total_price,0))
-
- when grouping (create_date) = 0 --日期
- then sum(if(order_rn=1 and create_date is not null,order_amount,0))
- else null end as sale_amt,
-
-
- --提示
- 计算 日期 城市 商圈 店铺 使用orderr_maount
- 计算 品牌 大类 中类 小类 使用total_price
- --为什么 详细见课堂画图。
-
- --对于重复的数据 之前使用row_number标记之后 在sum求和之前 使用if进行判断 为1的保留
知识点15:DWS层搭建--销售主题宽表--step6--金额指标统计
- --2、平台收入 plat_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null ,plat_fee,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null ,plat_fee,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null,plat_fee,0))
- when grouping (brand_id,brand_name) = 0
- then null
- when grouping (min_class_id,min_class_name) = 0
- then null
- when grouping (mid_class_id,mid_class_name) = 0
- then null
- when grouping (max_class_id,max_class_name) = 0
- then null
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null,plat_fee,0))
- else null end as plat_amt ,
-
- -- 3、配送成交额 deliver_sale_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null and delievery_id is not null ,dispatcher_money,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null and delievery_id is not null,dispatcher_money,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null and delievery_id is not null,dispatcher_money,0))
- when grouping (brand_id,brand_name) = 0
- then null
- when grouping (min_class_id,min_class_name) = 0
- then null
- when grouping (mid_class_id,mid_class_name) = 0
- then null
- when grouping (max_class_id,max_class_name) = 0
- then null
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null and delievery_id is not null ,dispatcher_money,0))
- else null end as deliver_sale_amt ,
-
- -- 4、小程序成交额 mini_app_sale_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null and order_from='miniapp' ,order_amount,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null and order_from='miniapp',order_amount,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null and order_from='miniapp',order_amount,0))
- when grouping (brand_id,brand_name) = 0
- then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='miniapp',total_price,0))
- when grouping (min_class_id,min_class_name) = 0
- then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='miniapp',total_price,0))
- when grouping (mid_class_id,mid_class_name) = 0
- then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='miniapp',total_price,0))
- when grouping (max_class_id,max_class_name) = 0
- then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='miniapp',total_price,0))
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null and order_from='miniapp',order_amount ,0))
- else null end as mini_app_sale_amt ,
-
- -- 5、安卓成交额 android_sale_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null and order_from='android' ,order_amount,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null and order_from='android',order_amount,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null and order_from='android',order_amount,0))
- when grouping (brand_id,brand_name) = 0
- then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='android',total_price,0))
- when grouping (min_class_id,min_class_name) = 0
- then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='android',total_price,0))
- when grouping (mid_class_id,mid_class_name) = 0
- then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='android',total_price,0))
- when grouping (max_class_id,max_class_name) = 0
- then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='android',total_price,0))
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null and order_from='android',order_amount ,0))
- else null end as android_sale_amt ,
-
- -- 6、苹果成交额 ios_sale_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null and order_from='ios' ,order_amount,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null and order_from='ios',order_amount,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null and order_from='ios',order_amount,0))
- when grouping (brand_id,brand_name) = 0
- then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='ios',total_price,0))
- when grouping (min_class_id,min_class_name) = 0
- then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='ios',total_price,0))
- when grouping (mid_class_id,mid_class_name) = 0
- then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='ios',total_price,0))
- when grouping (max_class_id,max_class_name) = 0
- then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='ios',total_price,0))
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null and order_from='ios',order_amount ,0))
- else null end as ios_sale_amt ,
-
- -- 7、pc成交额 pcweb_sale_amt
- case when grouping(store_id,store_name) =0
- then sum(if( order_rn = 1 and store_id is not null and order_from='pcweb' ,order_amount,0))
- when grouping (trade_area_id ,trade_area_name) = 0
- then sum(if( order_rn = 1 and trade_area_id is not null and order_from='pcweb',order_amount,0))
- when grouping (city_id,city_name) = 0
- then sum(if( order_rn = 1 and city_id is not null and order_from='pcweb',order_amount,0))
- when grouping (brand_id,brand_name) = 0
- then sum(if(brand_goods_rn = 1 and brand_id is not null and order_from='pcweb',total_price,0))
- when grouping (min_class_id,min_class_name) = 0
- then sum(if(minclass_goods_rn = 1 and min_class_id is not null and order_from='pcweb',total_price,0))
- when grouping (mid_class_id,mid_class_name) = 0
- then sum(if(midclass_goods_rn = 1 and mid_class_id is not null and order_from='pcweb',total_price,0))
- when grouping (max_class_id,max_class_name) = 0
- then sum(if(maxclass_goods_rn = 1 and max_class_id is not null and order_from='pcweb',total_price,0))
- when grouping (create_date) = 0
- then sum(if(order_rn=1 and create_date is not null and order_from='pcweb',order_amount ,0))
- else null end as pcweb_sale_amt ,
-