python pandas 数据排序

pandas中排序的几种常用方法，主要包括sort_index和sort_values。

基础数据：


import pandas as pd
import numpy as np
 
data = {
    'brand':['Python', 'C', 'C++', 'C#', 'Java'],
    'B':[4,6,8,12,10],
    'A':[10,2,5,20,16],
    'D':[6,18,14,6,12],
    'years':[4,1,1,30,30],
    'C':[8,12,18,8,2]
}
index = [9,3,4,5,2]
df = pd.DataFrame(data=data, index = index)
print("df数据：\n", df, '\n')

out：


df数据：
     A   B   C   D   brand  years
9  10   4   8   6  Python      4
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
5  20  12   8   6      C#     30
2  16  10   2  12    Java     30

按行索引排序：

print("按行索引排序:\n", df.sort_index(), '\n')

out：


按行索引排序:
     A   B   C   D   brand  years
2  16  10   2  12    Java     30
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
5  20  12   8   6      C#     30
9  10   4   8   6  Python      4

通过设置参数ascending可以设置升序或者降序排序，默认情况下ascending=True，为升序排序。

设置ascending=False时，为降序排序。

print("按行索引降序排序:\n", df.sort_index(ascending=False), '\n')

out:


按行索引降序排序:
     A   B   C   D   brand  years
9  10   4   8   6  Python      4
5  20  12   8   6      C#     30
4   5   8  18  14     C++      1
3   2   6  12  18       C      1
2  16  10   2  12    Java     30

按列的名称排序：

设置参数axis=1实现按列的名称排序：

print("按列名称排序:\n", df.sort_index(axis=1), '\n')

out：


按列名称排序:
     A   B   C   D   brand  years
9  10   4   8   6  Python      4
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
5  20  12   8   6      C#     30
2  16  10   2  12    Java     30

同样，也可以设置ascending参数：

print("按列名称排序:\n", df.sort_index(axis=1, ascending=False), '\n')

out：


按列名称排序:
    years   brand   D   C   B   A
9      4  Python   6   8   4  10
3      1       C  18  12   6   2
4      1     C++  14  18   8   5
5     30      C#   6   8  12  20
2     30    Java  12   2  10  16

按数值排序：

sort_values()是pandas中按数值排序的函数：

1、按单个列的值排序

sort_values()中设置单个列的列名，可以对单个列进行排序，通过设置ascending可以设置升序或者降序。

print("按列名称A排序:\n", df.sort_values('A'), '\n')

out：


按列名称排序:
     A   B   C   D   brand  years
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
9  10   4   8   6  Python      4
2  16  10   2  12    Java     30
5  20  12   8   6      C#     30

设置ascending=False进行降序排序：

print("按列名称A降序排序:\n", df.sort_values('A', ascending=False), '\n')

out：


按列名称A降序排序:
     A   B   C   D   brand  years
5  20  12   8   6      C#     30
2  16  10   2  12    Java     30
9  10   4   8   6  Python      4
4   5   8  18  14     C++      1
3   2   6  12  18       C      1

按多个列的值排序：

先按year列的数据进行升序排序，year列相同的再看B列进行升序排序

print("按多个列排序:\n", df.sort_values(['years', 'B']), '\n')

out：


按多个列排序:
     A   B   C   D   brand  years
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
9  10   4   8   6  Python      4
2  16  10   2  12    Java     30
5  20  12   8   6      C#     30

也可以分别设置列的升序、降序来排序：

years列为升序，B列为降序。

print("按多个列排序:\n", df.sort_values(['years', 'B'], ascending=[True, False]), '\n')

out：


按多个列排序:
     A   B   C   D   brand  years
4   5   8  18  14     C++      1
3   2   6  12  18       C      1
9  10   4   8   6  Python      4
5  20  12   8   6      C#     30
2  16  10   2  12    Java     30

inplace使用：

inplace=True：不创建新的对象，直接对原始对象进行修改；默认是False，即创建新的对象进行修改，原对象不变，和深复制和浅复制有些类似。


df.sort_values('A', inplace=True)
print("按A列排序:\n", df, '\n')

out:


按A列排序:
     A   B   C   D   brand  years
3   2   6  12  18       C      1
4   5   8  18  14     C++      1
9  10   4   8   6  Python      4
2  16  10   2  12    Java     30
5  20  12   8   6      C#     30

缺失值：

含有nan值的数据排序：


data = {
    'brand':['Python', 'C', 'C++', 'C#', 'Java'],
    'B':[4,6,8,np.nan,10],
    'A':[10,2,5,20,16],
    'D':[6,18,14,6,12],
    'years':[4,1,1,30,30],
    'C':[8,12,18,8,2]
}
index = [9,3,4,5,2]
df = pd.DataFrame(data=data, index = index)
print("df数据：\n", df, '\n')

out:


df数据：
     A     B   C   D   brand  years
9  10   4.0   8   6  Python      4
3   2   6.0  12  18       C      1
4   5   8.0  18  14     C++      1
5  20   NaN   8   6      C#     30
2  16  10.0   2  12    Java     30

B列含有nan值，对B列进行排序，缺失值排在最前面：

print("按B列排序:\n", df.sort_values('B', na_position='first'), '\n')


按B列排序:
     A     B   C   D   brand  years
5  20   NaN   8   6      C#     30
9  10   4.0   8   6  Python      4
3   2   6.0  12  18       C      1
4   5   8.0  18  14     C++      1
2  16  10.0   2  12    Java     30

包含缺失值，缺失值排在最后：

print("按B列排序:\n", df.sort_values('B', na_position='last'), '\n')

out：


按B列排序:
     A     B   C   D   brand  years
9  10   4.0   8   6  Python      4
3   2   6.0  12  18       C      1
4   5   8.0  18  14     C++      1
2  16  10.0   2  12    Java     30
5  20   NaN   8   6      C#     30

相关阅读:
互联网大厂大佬教你用300 行代码带你秒懂 Java 多线程！
can 分析仪 can卡 ——深圳超力源7220 电摩保护板联调时一个CAN盒解决所有的问题
 Programming Differential Privacy第十一章The Sparse Vector Technique稀疏向量技术
 洛谷 NOIP 2023 模拟赛挑战 NPC IV
Spring Boot自动配置原理懂后轻松写一个自己的starter
C++语法基础(5)——数组与字符串
 Day44——MySQL表查询关键字
 值得收藏的30道Python练手题（附详解）
网易云信4K 8K RTC助力远程医疗的技术实践
 【学习Docker（一）】Docker Jenkins的安装与卸载
原文地址：https://blog.csdn.net/xiadeliang1111/article/details/126831607