如有一张表stock_hq,表格中的数据如下所示:
TDATE | SECCODE | BLOCK | AMOUNT | PRICE |
---|---|---|---|---|
20221113 | 000001.sz | A01 | 210 | 10.10 |
20221113 | 000002.sz | A02 | 210 | 9.10 |
20221113 | 000003.sz | A01 | 210 | 8.10 |
20221114 | 000001.sz | A01 | 210 | 10.10 |
20221114 | 000002.sz | A02 | 210 | 9.10 |
20221114 | 000003.sz | A01 | 210 | 8.10 |
20221116 | 000001.sz | A01 | 210 | 10.10 |
20221116 | 000002.sz | A02 | 210 | 9.10 |
20221116 | 000003.sz | A01 | 210 | 8.10 |
20221117 | 000001.sz | A01 | 110 | 10.10 |
20221117 | 000002.sz | A02 | 110 | 9.10 |
20221117 | 000003.sz | A01 | 110 | 9.10 |
20221118 | 000001.sz | A01 | 100 | 10.10 |
20221118 | 000002.sz | A02 | 100 | 10.10 |
20221118 | 000003.sz | A01 | 100 | 10.10 |
排名,如果值相同,则排名并列,下一个排名会出现跳跃,即排名是不连续的。比如有前4个值一样,那么前4行并列第一,第5行的排名则为5.
select seccode,TDATE,AMOUNT,PRICE,RANK() OVER(PARTITION BY SECCODE ORDER BY PRICE) AS RANK_BY_PRICE
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | RANK_BY_PRICE |
---|---|---|---|---|
000001.sz | 20221118 | 100 | 10.10 | 1 |
000001.sz | 20221117 | 110 | 10.10 | 1 |
000001.sz | 20221113 | 210 | 10.10 | 1 |
000001.sz | 20221114 | 210 | 10.10 | 1 |
000001.sz | 20221116 | 210 | 10.10 | 1 |
000002.sz | 20221117 | 110 | 9.10 | 1 |
000002.sz | 20221113 | 210 | 9.10 | 1 |
000002.sz | 20221114 | 210 | 9.10 | 1 |
000002.sz | 20221116 | 210 | 9.10 | 1 |
000002.sz | 20221118 | 100 | 10.10 | 5 |
000003.sz | 20221113 | 210 | 8.10 | 1 |
000003.sz | 20221114 | 210 | 8.10 | 1 |
000003.sz | 20221116 | 210 | 8.10 | 1 |
000003.sz | 20221117 | 110 | 9.10 | 4 |
000003.sz | 20221118 | 100 | 10.10 | 5 |
由上面的输出结果,我们可以很明显的看出RANK_BY_PRICE的排名是不连续的。
按照排序的顺序输出窗口中的数据的行号信息,不会出现排名相同的情况,且排名是连续的。即使是值相同,排名也会按照其排序顺序递增。
select seccode,TDATE,AMOUNT,PRICE,ROW_NUMBER () OVER(PARTITION BY SECCODE ORDER BY PRICE) AS RANK_BY_PRICE
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | RANK_BY_PRICE |
---|---|---|---|---|
000001.sz | 20221116 | 210 | 10.10 | 1 |
000001.sz | 20221118 | 100 | 10.10 | 2 |
000001.sz | 20221117 | 110 | 10.10 | 3 |
000001.sz | 20221114 | 210 | 10.10 | 4 |
000001.sz | 20221113 | 210 | 10.10 | 5 |
000002.sz | 20221116 | 210 | 9.10 | 1 |
000002.sz | 20221117 | 110 | 9.10 | 2 |
000002.sz | 20221114 | 210 | 9.10 | 3 |
000002.sz | 20221113 | 210 | 9.10 | 4 |
000002.sz | 20221118 | 100 | 10.10 | 5 |
000003.sz | 20221116 | 210 | 8.10 | 1 |
000003.sz | 20221114 | 210 | 8.10 | 2 |
000003.sz | 20221113 | 210 | 8.10 | 3 |
000003.sz | 20221117 | 110 | 9.10 | 4 |
000003.sz | 20221118 | 100 | 10.10 | 5 |
由上面的输出结果,我们可以很明显的看出RANK_BY_PRICE的值是连续递增的,即使Price的值一样。
按照排序的顺序输出窗口中的数据的排名,值相同时,排名并列,下一个排名是连续递增的,不会出现跳跃情况。即如果前4行的值相同,则前4行并列第1,第5行的排名则为第2。
select seccode,TDATE,AMOUNT,PRICE,DENSE_RANK() OVER(PARTITION BY SECCODE ORDER BY PRICE) AS RANK_BY_PRICE
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | RANK_BY_PRICE |
---|---|---|---|---|
000001.sz | 20221117 | 110 | 10.10 | 1 |
000001.sz | 20221118 | 100 | 10.10 | 1 |
000001.sz | 20221114 | 210 | 10.10 | 1 |
000001.sz | 20221116 | 210 | 10.10 | 1 |
000001.sz | 20221113 | 210 | 10.10 | 1 |
000002.sz | 20221117 | 110 | 9.10 | 1 |
000002.sz | 20221114 | 210 | 9.10 | 1 |
000002.sz | 20221116 | 210 | 9.10 | 1 |
000002.sz | 20221113 | 210 | 9.10 | 1 |
000002.sz | 20221118 | 100 | 10.10 | 2 |
000003.sz | 20221114 | 210 | 8.10 | 1 |
000003.sz | 20221116 | 210 | 8.10 | 1 |
000003.sz | 20221113 | 210 | 8.10 | 1 |
000003.sz | 20221117 | 110 | 9.10 | 2 |
000003.sz | 20221118 | 100 | 10.10 | 3 |
由上面的输出结果,我们可以很明显的看出RANK_BY_PRICE的值在Price的值相同时是重复的,且是连续递增的。
与 PERCENT 函数类似,PERCENT_RANK 函数为窗口的 ORDER BY 子句所指定列中的值返回秩,以介于 0 和 1 之间的小数形式表示.
计算方法: (RANK - 1)/(N- 1)。
示例:
按照证券代码分组,统计出价格由低到高的PERCENT_RANK。
select seccode,TDATE,AMOUNT,PRICE,RANK() OVER(PARTITION BY SECCODE ORDER BY PRICE) AS RANK_BY_PRICE,
PERCENT_RANK () OVER(PARTITION BY SECCODE ORDER BY PRICE) AS PERCENT_RANK_BY_PRICE
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | RANK_BY_PRICE | PERCENT_RANK_BY_PRICE |
---|---|---|---|---|---|
000001.sz | 20221118 | 100 | 10.10 | 1 | 0.0 |
000001.sz | 20221114 | 210 | 10.10 | 1 | 0.0 |
000001.sz | 20221116 | 210 | 10.10 | 1 | 0.0 |
000001.sz | 20221117 | 110 | 10.10 | 1 | 0.0 |
000001.sz | 20221113 | 210 | 10.10 | 1 | 0.0 |
000002.sz | 20221114 | 210 | 9.10 | 1 | 0.0 |
000002.sz | 20221116 | 210 | 9.10 | 1 | 0.0 |
000002.sz | 20221117 | 110 | 9.10 | 1 | 0.0 |
000002.sz | 20221113 | 210 | 9.10 | 1 | 0.0 |
000002.sz | 20221118 | 100 | 10.10 | 5 | 1.0 |
000003.sz | 20221114 | 210 | 8.10 | 1 | 0.0 |
000003.sz | 20221116 | 210 | 8.10 | 1 | 0.0 |
000003.sz | 20221113 | 210 | 8.10 | 1 | 0.0 |
000003.sz | 20221117 | 110 | 9.10 | 4 | 0.75 |
000003.sz | 20221118 | 100 | 10.10 | 5 | 1.0 |
由上面的输出结果,我们可以很明显的看出PERCENT_RANK_BY_PRICE的值等于RANK_BY_PRICE的值-1 再除以每个分区窗口的总行数。
如果是按照升序排列:统计出小于等于当前排序值的行数所占当前分区窗口总行数的比例。
如果是按照排序字段的降序:则统计出大于等于当前排序字段的值的行数所占当前分区窗口总行数的比例。
select seccode,TDATE,AMOUNT,PRICE,CUME_DIST () OVER(PARTITION BY SECCODE ORDER BY AMOUNT) AS CUME_DIST_BY_AMOUNT
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | CUME_DIST_BY_AMOUNT |
---|---|---|---|---|
000001.sz | 20221118 | 100 | 10.10 | 0.2 |
000001.sz | 20221117 | 110 | 10.10 | 0.4 |
000001.sz | 20221113 | 210 | 10.10 | 1.0 |
000001.sz | 20221114 | 210 | 10.10 | 1.0 |
000001.sz | 20221116 | 210 | 10.10 | 1.0 |
000002.sz | 20221118 | 100 | 10.10 | 0.2 |
000002.sz | 20221117 | 110 | 9.10 | 0.4 |
000002.sz | 20221113 | 210 | 9.10 | 1.0 |
000002.sz | 20221114 | 210 | 9.10 | 1.0 |
000002.sz | 20221116 | 210 | 9.10 | 1.0 |
000003.sz | 20221118 | 100 | 10.10 | 0.2 |
000003.sz | 20221117 | 110 | 9.10 | 0.4 |
000003.sz | 20221113 | 210 | 8.10 | 1.0 |
000003.sz | 20221114 | 210 | 8.10 | 1.0 |
000003.sz | 20221116 | 210 | 8.10 | 1.0 |
结果解析:如000001.sz的分区窗口,总行数是5,第一条数据AMOUNT的值是100,小于等于100的行数只有1条(它自己),则CUME_DIST的值未1/5=0.2;
第二条数据AMOUNT的值是110,小于等于110的行数的有2条,则为2/5=0.4;
后续的数据的计算同理,在此就不在赘述。
用于将分组数据按照顺序切分成N片,返回当前切片值。将一个有序的数据集划分为多个桶(bucket),并为每行分配一个适当的桶数(切片值,第几个切片,第几个分区等概念)。它可用于将数据划分为相等的小切片,为每一行分配该小切片的数字序号。
N的值默认为1.
分片特点:尽可能的均匀分布到分片中,如果分片不均匀,则默认增加到前面的分片中。比如5条数据分成2片,则第一个分片3条,第2个分片2条。如果分成3个分片,则第一个分片2条,第2个分片2条,第3个分片1条。
TIPS:: NTILE不支持ROWS BETWEEN,RANGE BETWEEN,
select seccode,TDATE,AMOUNT,PRICE,NTILE(3) OVER(PARTITION BY SECCODE ORDER BY AMOUNT) AS NTILE_BY_AMOUNT
from stock_hq sh ;
输出:
seccode | TDATE | AMOUNT | PRICE | NTILE_BY_AMOUNT |
---|---|---|---|---|
000001.sz | 20221118 | 100 | 10.10 | 1 |
000001.sz | 20221117 | 110 | 10.10 | 1 |
000001.sz | 20221114 | 210 | 10.10 | 2 |
000001.sz | 20221113 | 210 | 10.10 | 2 |
000001.sz | 20221116 | 210 | 10.10 | 3 |
000002.sz | 20221118 | 100 | 10.10 | 1 |
000002.sz | 20221117 | 110 | 9.10 | 1 |
000002.sz | 20221114 | 210 | 9.10 | 2 |
000002.sz | 20221113 | 210 | 9.10 | 2 |
000002.sz | 20221116 | 210 | 9.10 | 3 |
000003.sz | 20221118 | 100 | 10.10 | 1 |
000003.sz | 20221117 | 110 | 9.10 | 1 |
000003.sz | 20221114 | 210 | 8.10 | 2 |
000003.sz | 20221113 | 210 | 8.10 | 2 |
000003.sz | 20221116 | 210 | 8.10 | 3 |