操作 | 语法 |
---|---|
选择列 | df['列名'] 或 df.列名 |
使用切片选择行 | df[5:10] |
按索引选择行 | df.loc[label] |
按数字索引选择行 | df.iloc[loc] |
用表达式筛选行 | df[bool_vec] |
数据选择,即按照一定的条件对数据进行筛选,通过Pandas提供的方法可以模拟Excel对数据的筛选操作,可灵活应对各种数据的查询需求
- import pandas as pd
-
- df = pd.DataFrame([['liver','E',89,21,24,64],
- ['Arry','C',36,37,37,57],
- ['Ack','A',57,60,18,84],
- ['Eorge','C',93,96,71,78],
- ['Oah','D',65,49,61,86]
- ],
- columns = ['name','team','Q1','Q2','Q3','Q4'])
-
- # 以下两种方法会返回'name'列的数据,得到的数据类型为Series
- res1 = df['name']
- res2 = df.name
- res3 = df.Q1
- type(res3) # pandas.core.series.Series
- #--------------------------------------------------------------------
- print(res1 == res2)
- # 输出结果:
- # 0 True
- # 1 True
- # 2 True
- # 3 True
- # 4 True
- # Name: name, dtype: bool
- #--------------------------------------------------------------------
df
res1
res2
res3
我们可以像列表那样利用切片功能选择部分行的数据(索引值从0开始),但是不支持仅索引一条数据,需要注意的是,Pandas使用切片的逻辑与Python列表的逻辑一样,不包括右边的索引值
- import pandas as pd
-
- df = pd.DataFrame([['liver','E',89,21,24,64],
- ['Arry','C',36,37,37,57],
- ['Ack','A',57,60,18,84],
- ['Eorge','C',93,96,71,78],
- ['Oah','D',65,49,61,86]
- ],
- columns = ['name','team','Q1','Q2','Q3','Q4'])
-
- # 前两行数据
- res1 = df[:2]
- # 后两行数据
- res2 = df[3:]
- # 所有数据(不推荐)
- res3 = df[:]
- # 按步长取
- res4 = df[:5:2]
- # 反转顺序
- res5 = df[::-1]
- # 报错
- res6 = df[2]
df
res1
res2
res3
res4
res5
如果切片里是一个列名组成的列表(形式: df[['列名','列名',...]]),则可以筛选出多列数据
- import pandas as pd
-
- df = pd.DataFrame([['liver','E',89,21,24,64],
- ['Arry','C',36,37,37,57],
- ['Ack','A',57,60,18,84],
- ['Eorge','C',93,96,71,78],
- ['Oah','D',65,49,61,86]
- ],
- columns = ['name','team','Q1','Q2','Q3','Q4'])
-
- # 筛选'name','team'两列数据
- res = df[['name','team']]
-
- # 需要区别的是,如果只有一列(格式: df[['列名']]),则会是一个DataFrame:
- res1 = df[['name']]
- type(res1) # pandas.core.frame.DataFrame
-
- res2 = df['name']
- type(res2) # pandas.core.series.Series
df
res
res1
res2
loc: works on labels in the index
按轴标签.lochttps://blog.csdn.net/Hudas/article/details/123096447?spm=1001.2014.3001.5502
iloc: works on the positions in the index(so it only takes integers)
按数字索引.ilochttps://blog.csdn.net/Hudas/article/details/123096447?spm=1001.2014.3001.5502
- import pandas as pd
-
- df = pd.DataFrame([['liver','E',89,21,24,64],
- ['Arry','C',36,37,37,57],
- ['Ack','A',57,60,18,84],
- ['Eorge','C',93,96,71,78],
- ['Oah','D',65,49,61,86]
- ],
- columns = ['name','team','Q1','Q2','Q3','Q4'])
- # Q1等于36
- res1 = df[df['Q1'] == 36]
- # Q1不等于36
- res2 = df[~(df['Q1'] == 36)]
- # 姓名为'Eorge'
- res3 = df[df['name'] == 'Eorge']
- # 筛选Q1大于Q2的行记录
- res4 = df[df.Q1 > df.Q2]
df
res1
res2
res3
res4