spark 窗口函数对多列数据进行排名示例

如果我们要select 同学的 id，语文成绩，语文成绩排名，数学成绩，数学成绩排名，英语成绩，英语成绩排名。
可以使用以窗口函数

## 创建表
create table t_window(id string, chinese int, math int, english int);

## 插入数据
insert into t_window 
values
('1', 99, 88, 77),
('2', 77, 99, 88), 
('3', 88, 77, 99);

## 检索数据, 最后的结果按学号排序
select id, 
chinese, row_number() over(order by chinese desc) as chinese_order,
math, row_number() over(order by math desc) as math_order ,
english, row_number() over(order by english desc) as english_order  from t_window order by id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

结果如下，

学号	语文成绩	语文成绩排名	数学	数学成绩排名	英语	英语成绩排名
1	99	1	88	2	77	3
2	77	3	99	1	88	2
3	88	2	77	3	99	1

生成执行计划

== Physical Plan ==
*(4) Sort [id#156 ASC NULLS FIRST], true, 0
+- *(4) Project [id#156, chinese#157, chinese_order#145, math#158, math_order#146, english#159, english_order#147]
   +- Window [row_number() windowspecdefinition(english#159 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS english_order#147], [english#159 DESC NULLS LAST]
      +- *(3) Sort [english#159 DESC NULLS LAST], false, 0
         +- Window [row_number() windowspecdefinition(math#158 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS math_order#146], [math#158 DESC NULLS LAST]
            +- *(2) Sort [math#158 DESC NULLS LAST], false, 0
               +- Window [row_number() windowspecdefinition(chinese#157 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS chinese_order#145], [chinese#157 DESC NULLS LAST]
                  +- *(1) Sort [chinese#157 DESC NULLS LAST], false, 0
                     +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#306]
                        +- Scan hive hzz.t_window [id#156, chinese#157, math#158, english#159], HiveTableRelation [`hzz`.`t_window`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#156, chinese#157, math#158, english#159], Partition Cols: []]
1
2
3
4
5
6
7
8
9
10
11

最终的执行计划：
先scan hzz.t_window。
Exchange SinglePartition 是把所有数据都放到一个分区，因为要全排序，没有partition by。
(1) Sort [chinese#157 DESC NULLS LAST] ：对语文进行倒排序。
Window [row_number() windowspecdefinition(chinese#157…：生成语文的顺序
(2) Sort [math#158 DESC NULLS LAST] ：对数学进行倒排序。
Window [row_number() windowspecdefinition(math#158 ：生成数学的顺序
Sort [english#159 DESC NULLS LAST]：对英语进行倒排序。
Window [row_number() windowspecdefinition(english#159 ：生成英语的顺序
Project 选择出需要的列。
Sort [id#156 ASC NULLS FIRST]：对结果按学号进行全局排序。

相关阅读:
激光雷达标定板精准识别前方障碍物
【Qt】界面优化
台达PLC出现通信错误或通信超时或下载时提示机种不符的解决办法总结
“精准分割视频，误差降低至零——视频剪辑的新革命！”
一次有关 DNS 解析导致 APP 慢的问题探究
软考-网络安全体系与网络安全模型
Embedding
【UCIe】UCIe 相关术语名词缩写释义
计算机网络（一）：概述
30个必知必会React面试题汇总，附答案！

原文地址：https://blog.csdn.net/houzhizhen/article/details/133357606