pandas选择数据

一.使用索引或列名选择数据

1.loc方法基于标签选择数据，可以接受布尔数组。
2.iloc方法基于位置索引选择数据

二、基础使用

定义一个df

df = pd.DataFrame(np.random.randint(1,100,size=(6,5)), index=[1,2,3,4,5,6],columns=['A', 'B', 'C', 'D', 'E'])
1

先选择列名，得到一个series，再根据索引选择具体的值

print(df[‘A’][5])# 64
df[[‘B’, ‘A’]] = df[[‘A’, ‘B’]]# 快速替换两列
替换两列也可以直接用真实的数据：

df.loc[:, ['B', 'A']] = df[['A', 'B']]#尝试替换，失败
df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()# 成功
df.iloc[:, [1, 0]] = df[['A', 'B']] # 成功
1
2
3

选择数据：

print(df.loc[1:3, 'A':'C'])# 先是index，后是column
print()
print(df.iloc[0:3, 0:3])
print()
print(df[df['A'] > 50]) # 选择 'A' 列中值大于 50 的所有行
1
2
3
4
5

三、属性

感觉是根据列名来的，比方说下面是’A’列

print(df.A)
1

四、切片

print(df[:3])# 提取前三行
1

切片语法 start:stop:step
start 是开始的索引，如果省略，默认从第一行开始。
stop 是结束的索引，在这个例子中被省略了，所以会选择到最后一行。
step 是步长，在这个例子中为 3，意味着从 start 开始每隔 step - 1 行选择一行。
步长为 -1，意味着切片操作是从序列的末尾向前进行，每次移动一个元素。
因此，s[::-1] 会从 s 的最后一个元素开始，逆序遍历整个序列，直到序列的开始，从而创建一个逆序的序列。

print(df[::3])#从第一行开始，然后每三行选择一次
1

五、标签.loc选择数据

接收单个标签、标签的数组、标签的切片、布尔数组、对象（函数和方法等）。
切片时应确保索引上下界都存在，是排好序的，且不重复。

print(df.loc[1:3, 'A':'C'])
1

索引上界和下界都被包含。1和3是index，不是位置

布尔数组的举例：

In [56]: df1.loc['a'] > 0
Out[56]: 
A     True
B    False
C    False
D    False
Name: a, dtype: bool

In [57]: df1.loc[:, df1.loc['a'] > 0]
Out[57]: 
          A
a  0.132003
b  1.130127
c  1.024180
d  0.974466
e  0.545952
f -1.281247
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

六、根据位置选择数据.iloc

在 Pandas 中，“chained assignment” 指的是对 DataFrame 进行连续索引操作以赋值的行为。如：df['A'][0] = 100
当使用链式赋值时，Pandas 可能会返回一个 DataFrame 或其某个部分的副本而非视图.
上界包括，但下界不包含。
接收一个整数、一个整数数组、整数切片、布尔数组、对象。举例：

In [68]: s1 = pd.Series(np.random.randn(5), index=list(range(0, 10, 2)))

In [69]: s1
Out[69]: 
0    0.695775
2    0.341734
4    0.959726
6   -1.110336
8   -0.619976
dtype: float64

In [70]: s1.iloc[:3]
Out[70]: 
0    0.695775
2    0.341734
4    0.959726
dtype: float64

In [71]: s1.iloc[3]
Out[71]: -1.110336102891167
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

七、根据对象选择数据

主要使用lambda函数，

In [98]: df1 = pd.DataFrame(np.random.randn(6, 4),
   ....:                    index=list('abcdef'),
   ....:                    columns=list('ABCD'))
   ....: 

In [99]: df1
Out[99]: 
          A         B         C         D
a -0.023688  2.410179  1.450520  0.206053
b -0.251905 -2.213588  1.063327  1.266143
c  0.299368 -0.863838  0.408204 -1.048089
d -0.025747 -0.988387  0.094055  1.262731
e  1.289997  0.082423 -0.055758  0.536580
f -0.489682  0.369374 -0.034571 -2.484478

In [100]: df1.loc[lambda df: df['A'] > 0, :]
Out[100]: 
          A         B         C         D
c  0.299368 -0.863838  0.408204 -1.048089
e  1.289997  0.082423 -0.055758  0.536580

In [101]: df1.loc[:, lambda df: ['A', 'B']]
Out[101]: 
          A         B
a -0.023688  2.410179
b -0.251905 -2.213588
c  0.299368 -0.863838
d -0.025747 -0.988387
e  1.289997  0.082423
f -0.489682  0.369374

In [102]: df1.iloc[:, lambda df: [0, 1]]
Out[102]: 
          A         B
a -0.023688  2.410179
b -0.251905 -2.213588
c  0.299368 -0.863838
d -0.025747 -0.988387
e  1.289997  0.082423
f -0.489682  0.369374

In [103]: df1[lambda df: df.columns[0]]
Out[103]: 
a   -0.023688
b   -0.251905
c    0.299368
d   -0.025747
e    1.289997
f   -0.489682
Name: A, dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

八、将位置和标签结合选择数据

可以使用loc或者iloc，原理就是使用一些方法得到位置或者标签

In [107]: dfd = pd.DataFrame({'A': [1, 2, 3],
   .....:                     'B': [4, 5, 6]},
   .....:                    index=list('abc'))
   .....: 

In [108]: dfd
Out[108]: 
   A  B
a  1  4
b  2  5
c  3  6

In [109]: dfd.loc[dfd.index[[0, 2]], 'A']
Out[109]: 
a    1
c    3
Name: A, dtype: int64

In [110]: dfd.iloc[[0, 2], dfd.columns.get_loc('A')]
Out[110]: 
a    1
c    3
Name: A, dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

重新索引：reindex
index.intersection方法用于找出两个索引的共同元素，不会改变原 Series 的长度或引入 NaN。

九、随机采样sample

In [122]: s = pd.Series([0, 1, 2, 3, 4, 5])

# When no arguments are passed, returns 1 row.默认
In [123]: s.sample()
Out[123]: 
4    4
dtype: int64

# One may specify either a number of rows:指定数目
In [124]: s.sample(n=3)
Out[124]: 
0    0
4    4
1    1
dtype: int64

# Or a fraction of the rows:指定比例
In [125]: s.sample(frac=0.5)
Out[125]: 
5    5
3    3
1    1
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

replace参数控制可多次采样同一行，默认否，即False
默认的采样概率是一致的，可以通过weights参数控制：

In [129]: s = pd.Series([0, 1, 2, 3, 4, 5])

In [130]: example_weights = [0, 0, 0.2, 0.2, 0.2, 0.4]

In [131]: s.sample(n=3, weights=example_weights)
Out[131]: 
5    5
4    4
3    3
dtype: int64

# Weights will be re-normalized automatically
In [132]: example_weights2 = [0.5, 0, 0, 0, 0, 0]

In [133]: s.sample(n=1, weights=example_weights2)
Out[133]: 
0    0
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

random_state参数控制随机性

十、快速选取标量

at和iat方法分别提供从标签和位置选取标量的方法

In [151]: s.iat[5]
Out[151]: 5

In [152]: df.at[dates[5], 'A']
Out[152]: 0.1136484096888855

In [153]: df.iat[3, 0]
Out[153]: -0.7067711336300845
1
2
3
4
5
6
7
8

十一、布尔索引

|是or，&是and，~是not

In [158]: s = pd.Series(range(-3, 4))

In [159]: s
Out[159]: 
0   -3
1   -2
2   -1
3    0
4    1
5    2
6    3
dtype: int64

In [160]: s[s > 0]
Out[160]: 
4    1
5    2
6    3
dtype: int64

In [161]: s[(s < -1) | (s > 0.5)]
Out[161]: 
0   -3
1   -2
4    1
5    2
6    3
dtype: int64

In [162]: s[~(s < 0)]
Out[162]: 
3    0
4    1
5    2
6    3
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

列表推导式和映射函数也可以使用

In [164]: df2 = pd.DataFrame({'a': ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
   .....:                     'b': ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
   .....:                     'c': np.random.randn(7)})
   .....: 

# only want 'two' or 'three'
In [165]: criterion = df2['a'].map(lambda x: x.startswith('t'))

In [166]: df2[criterion]
Out[166]: 
       a  b         c
2    two  y  0.041290
3  three  x  0.361719
4    two  y -0.238075

# equivalent but slower
In [167]: df2[[x.startswith('t') for x in df2['a']]]
Out[167]: 
       a  b         c
2    two  y  0.041290
3  three  x  0.361719
4    two  y -0.238075

# Multiple criteria
In [168]: df2[criterion & (df2['b'] == 'x')]
Out[168]: 
       a  b         c
3  three  x  0.361719
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

十二、isin函数

根据元素是否存在返回布尔值。也可以作用于索引对象

In [175]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')

In [176]: s
Out[176]: 
4    0
3    1
2    2
1    3
0    4
dtype: int64

In [177]: s.isin([2, 4, 6])
Out[177]: 
4    False
3    False
2     True
1    False
0     True
dtype: bool

In [178]: s[s.isin([2, 4, 6])]
Out[178]: 
2    2
0    4
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

可以用于多重索引

In [181]: s_mi = pd.Series(np.arange(6),
   .....:                  index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]))
   .....: 

In [182]: s_mi
Out[182]: 
0  a    0
   b    1
   c    2
1  a    3
   b    4
   c    5
dtype: int64

In [183]: s_mi.iloc[s_mi.index.isin([(1, 'a'), (2, 'b'), (0, 'c')])]
Out[183]: 
0  c    2
1  a    3
dtype: int64

In [184]: s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)]
Out[184]: 
0  a    0
   c    2
1  a    3
   c    5
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

应用于dataframe中，是返回与其大小一致的布尔dataframe。可以针对全部元素，也可以只面向个别标签，如：

In [185]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'],
   .....:                    'ids2': ['a', 'n', 'c', 'n']})
   .....: 

In [186]: values = ['a', 'b', 1, 3]

In [187]: df.isin(values)
Out[187]: 
    vals    ids   ids2
0   True   True   True
1  False   True  False
2   True  False  False
3  False  False  False
1
2
3
4
5
6
7
8
9
10
11
12
13

In [188]: values = {'ids': ['a', 'b'], 'vals': [1, 3]}

In [189]: df.isin(values)
Out[189]: 
    vals    ids   ids2
0   True   True  False
1  False   True  False
2   True  False  False
3  False  False  False
1
2
3
4
5
6
7
8
9

In [192]: values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}

In [193]: row_mask = df.isin(values).all(1)# 1就是axi=1，即沿着列轴，行方向切

In [194]: df[row_mask]
Out[194]: 
   vals ids ids2
0     1   a    a
1
2
3
4
5
6
7
8

十三、 where函数

与[]的区别在于该函数返回值是和原始值的形状一样的

In [195]: s[s > 0]
Out[195]: 
3    1
2    2
1    3
0    4
dtype: int64
1
2
3
4
5
6
7

In [196]: s.where(s > 0)
Out[196]: 
4    NaN
3    1.0
2    2.0
1    3.0
0    4.0
dtype: float64
1
2
3
4
5
6
7
8

.where()方法接受一个条件和一个替代值作为参数。如：

In [215]: df3 = pd.DataFrame({'A': [1, 2, 3],
   .....:                     'B': [4, 5, 6],
   .....:                     'C': [7, 8, 9]})
   .....: 

In [216]: df3.where(lambda x: x > 4, lambda x: x + 10)
Out[216]: 
    A   B  C
0  11  14  7
1  12   5  8
2  13   6  9
1
2
3
4
5
6
7
8
9
10
11

十四、mask函数

是where的布尔逆运算

In [217]: s.mask(s >= 0)
Out[217]: 
4   NaN
3   NaN
2   NaN
1   NaN
0   NaN
dtype: float64

In [218]: df.mask(df >= 0)
Out[218]: 
                   A         B         C         D
2000-01-01 -2.104139 -1.309525       NaN       NaN
2000-01-02 -0.352480       NaN -1.192319       NaN
2000-01-03 -0.864883       NaN -0.227870       NaN
2000-01-04       NaN -1.222082       NaN -1.233203
2000-01-05       NaN -0.605656 -1.169184       NaN
2000-01-06       NaN -0.948458       NaN -0.684718
2000-01-07 -2.670153 -0.114722       NaN -0.048048
2000-01-08       NaN       NaN -0.048788 -0.808838
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

十五、query()函数

使用表达式查询数据

In [226]: n = 10

In [227]: df = pd.DataFrame(np.random.rand(n, 3), columns=list('abc'))

In [228]: df
Out[228]: 
          a         b         c
0  0.438921  0.118680  0.863670
1  0.138138  0.577363  0.686602
2  0.595307  0.564592  0.520630
3  0.913052  0.926075  0.616184
4  0.078718  0.854477  0.898725
5  0.076404  0.523211  0.591538
6  0.792342  0.216974  0.564056
7  0.397890  0.454131  0.915716
8  0.074315  0.437913  0.019794
9  0.559209  0.502065  0.026437

# pure python
In [229]: df[(df['a'] < df['b']) & (df['b'] < df['c'])]
Out[229]: 
          a         b         c
1  0.138138  0.577363  0.686602
4  0.078718  0.854477  0.898725
5  0.076404  0.523211  0.591538
7  0.397890  0.454131  0.915716

# query
In [230]: df.query('(a < b) & (b < c)')
Out[230]: 
          a         b         c
1  0.138138  0.577363  0.686602
4  0.078718  0.854477  0.898725
5  0.076404  0.523211  0.591538
7  0.397890  0.454131  0.915716
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

放在多重索引中也是可以的，可以指定索引级别：

df.query('ilevel_0 == "red"')
1

使用 ==/!= 将值列表与列进行比较的方式与 in/not in 类似。

十六、重复值的处理

duplicated用于检测是否存在重复值
drop_duplicates移除重复值
默认第一次出现的值是非重复的，也就是保留的，但是也可以控制，使用keep参数，有三个值，‘first’，‘last’，False，分别对应保留第一个值，保留最后一个值，不保留。

In [294]: df2 = pd.DataFrame({'a': ['one', 'one', 'two', 'two', 'two', 'three', 'four'],
   .....:                     'b': ['x', 'y', 'x', 'y', 'x', 'x', 'x'],
   .....:                     'c': np.random.randn(7)})
   .....: 

In [295]: df2
Out[295]: 
       a  b         c
0    one  x -1.067137
1    one  y  0.309500
2    two  x -0.211056
3    two  y -1.842023
4    two  x -0.390820
5  three  x -1.964475
6   four  x  1.298329

In [296]: df2.duplicated('a')
Out[296]: 
0    False
1     True
2    False
3     True
4     True
5    False
6    False
dtype: bool

In [297]: df2.duplicated('a', keep='last')
Out[297]: 
0     True
1    False
2     True
3     True
4    False
5    False
6    False
dtype: bool

In [298]: df2.duplicated('a', keep=False)
Out[298]: 
0     True
1     True
2     True
3     True
4     True
5    False
6    False
dtype: bool

In [299]: df2.drop_duplicates('a')
Out[299]: 
       a  b         c
0    one  x -1.067137
2    two  x -0.211056
5  three  x -1.964475
6   four  x  1.298329

In [300]: df2.drop_duplicates('a', keep='last')
Out[300]: 
       a  b         c
1    one  y  0.309500
4    two  x -0.390820
5  three  x -1.964475
6   four  x  1.298329

In [301]: df2.drop_duplicates('a', keep=False)
Out[301]: 
       a  b         c
5  three  x -1.964475
6   four  x  1.298329
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

也可以传入一个列表，如[‘a’,‘b’]，将之作为一个subset

十七、get方法

可以返回一个值，类似于df[‘col1’]

In [310]: s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

In [311]: s.get('a')  # equivalent to s['a']
Out[311]: 1

In [312]: s.get('x', default=-1)
Out[312]: -1
1
2
3
4
5
6
7

十八、因式分解factorize

In [313]: df = pd.DataFrame({'col': ["A", "A", "B", "B"],
   .....:                    'A': [80, 23, np.nan, 22],
   .....:                    'B': [80, 55, 76, 67]})
   .....: 

In [314]: df
Out[314]: 
  col     A   B
0   A  80.0  80
1   A  23.0  55
2   B   NaN  76
3   B  22.0  67

In [315]: idx, cols = pd.factorize(df['col'])

In [316]: df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Out[316]: array([80., 23., 76., 67.])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

让我们详细解释这一行代码：

idx, cols = pd.factorize(df['col'])
1

这行代码使用了 Pandas 的 factorize 函数，它的作用是对输入的序列（这里是 df['col']
列）进行因子化处理。因子化是将具体的值映射为整数索引的过程，通常用于处理分类数据。我们来分解这个操作：

pd.factorize(...):

这个函数接受一个可迭代的序列（在这个例子中是 df['col']，其内容是 [‘A’, ‘A’, ‘B’, ‘B’]）。
函数的目的是将这个序列中的每个唯一值映射到一个整数。
返回值是两个元素的元组：第一个元素是一个数组，表示原序列中每个元素的整数索引；第二个元素是一个唯一值数组，表示原序列中的唯一值。

idx, cols:

idx（索引）: 这是 factorize 函数返回的第一个元素。在我们的例子中，对于输入 [‘A’, ‘A’, ‘B’, ‘B’]，idx 会是 [0, 0, 1,
1]。这表示第一个和第二个元素都对应于唯一值数组中的第一个元素（‘A’），而第三个和第四个元素对应于第二个元素（‘B’）。
cols（唯一值）: 这是 factorize 函数返回的第二个元素，表示原始序列中的唯一值。在这个例子中，cols 将是 [‘A’, ‘B’]。

因此，这行代码的作用是创建两个数组：idx 映射原始 ‘col’ 列中的每个值到一个整数索引，而 cols
则包含了这些唯一的标签值。这种映射在数据分析中尤其有用，因为它允许我们使用数值操作来处理原本是分类的数据。
当然，让我们继续用中文来详细解释这段代码的最后两步：

第五步：重排 DataFrame 并转换为 NumPy 数组

python df.reindex(cols, axis=1).to_numpy()

df.reindex(cols, axis=1)：这个方法根据 cols 数组重排 df 的列。由于 cols 是通过对 ‘col’ 列进行因子化得到的，所以它包含了 ‘A’ 和 ‘B’。重排之后的 DataFrame 会按照 ‘A’ 和 ‘B’ 的顺序排列列。
.to_numpy()：将重排后的 DataFrame 转换成一个 NumPy 数组。这一步是为了便于下一步使用 NumPy 的高级索引功能。

第六步：使用高级索引选取元素

python np.arange(len(df)), idx

np.arange(len(df))：生成一个从 0 到 df 长度减 1 的数组，实际上就是生成了一个行索引数组，例如 [0, 1, 2, 3]。
idx：之前通过因子化得到的数组，表示 ‘col’ 列中每个元素对应的列索引，例如对于 [‘A’, ‘A’, ‘B’, ‘B’]，idx 为 [0, 0, 1, 1]。
这里使用了 NumPy 的高级索引。通过配对 np.arange(len(df)) 和 idx，我们为每一行选取了一个特定的列。具体来说，对于每一行，它根据 ‘col’ 列的值（‘A’ 或 ‘B’）来决定是从 ‘A’ 列还是
‘B’ 列中取值。
结果是 [80., 23., 76., 67.]，这个数组包含了根据 ‘col’ 列的指示从 ‘A’ 或 ‘B’ 列中选取的元素。

简单来说，这段代码的目的是根据 ‘col’ 列的值来决定每行应该从 ‘A’ 列还是 ‘B’ 列中提取数据。

十九、索引对象Index

可以创建索引

In [317]: index = pd.Index(['e', 'd', 'a', 'b'])

In [318]: index
Out[318]: Index(['e', 'd', 'a', 'b'], dtype='object')

In [319]: 'd' in index
Out[319]: True
1
2
3
4
5
6
7

使用dtype参数控制类型

In [331]: index = pd.Index(list(range(5)), name='rows')

In [332]: columns = pd.Index(['A', 'B', 'C'], name='cols')

In [333]: df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns)

In [334]: df
Out[334]: 
cols         A         B         C
rows                              
0     1.295989 -1.051694  1.340429
1    -2.366110  0.428241  0.387275
2     0.433306  0.929548  0.278094
3     2.154730 -0.315628  0.264223
4     1.126818  1.132290 -0.353310

In [335]: df['A']
Out[335]: 
rows
0    1.295989
1   -2.366110
2    0.433306
3    2.154730
4    1.126818
Name: A, dtype: float64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

index对象的操作函数包括difference，union，intersection等

In [346]: a = pd.Index(['c', 'b', 'a'])

In [347]: b = pd.Index(['c', 'e', 'd'])

In [348]: a.difference(b)
Out[348]: Index(['a', 'b'], dtype='object')
1
2
3
4
5
6

fillna方法用一个指定值填充缺失值：

In [355]: idx1 = pd.Index([1, np.nan, 3, 4])

In [356]: idx1
Out[356]: Index([1.0, nan, 3.0, 4.0], dtype='float64')

In [357]: idx1.fillna(2)
Out[357]: Index([1.0, 2.0, 3.0, 4.0], dtype='float64')

In [358]: idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'),
   .....:                          pd.NaT,
   .....:                          pd.Timestamp('2011-01-03')])
   .....: 

In [359]: idx2
Out[359]: DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03'], dtype='datetime64[ns]', freq=None)

In [360]: idx2.fillna(pd.Timestamp('2011-01-02'))
Out[360]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq=None)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

set_index()采用列名或列表作为索引：

In [361]: data = pd.DataFrame({'a': ['bar', 'bar', 'foo', 'foo'],
   .....:                      'b': ['one', 'two', 'one', 'two'],
   .....:                      'c': ['z', 'y', 'x', 'w'],
   .....:                      'd': [1., 2., 3, 4]})
   .....: 

In [362]: data
Out[362]: 
     a    b  c    d
0  bar  one  z  1.0
1  bar  two  y  2.0
2  foo  one  x  3.0
3  foo  two  w  4.0

In [363]: indexed1 = data.set_index('c')

In [364]: indexed1
Out[364]: 
     a    b    d
c               
z  bar  one  1.0
y  bar  two  2.0
x  foo  one  3.0
w  foo  two  4.0

In [365]: indexed2 = data.set_index(['a', 'b'])

In [366]: indexed2
Out[366]: 
         c    d
a   b          
bar one  z  1.0
    two  y  2.0
foo one  x  3.0
    two  w  4.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

reset_index()与上面的相反，重置索引

In [371]: data
Out[371]: 
     a    b  c    d
0  bar  one  z  1.0
1  bar  two  y  2.0
2  foo  one  x  3.0
3  foo  two  w  4.0

In [372]: data.reset_index()
Out[372]: 
   index    a    b  c    d
0      0  bar  one  z  1.0
1      1  bar  two  y  2.0
2      2  foo  one  x  3.0
3      3  foo  two  w  4.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

相关阅读:
tensorrt: pycuda, onnx, onnxruntime, tensorrt,torch-tensorrt 安装
 大学生化妆品网页设计模板代码化妆美妆网页作业成品学校美妆官网网页制作模板学生简单html网站设计成品
 前端模块化
 记使用docker部署项目出现问题
 【uni-app从入门到实战】条件编译、导航学习
 代码解读：Zero-shot 视频生成任务 Text2Video-Zero
Ollama--本地大语言模型LLM运行专家
 Spring：AOP通知获取数据（13）
被杭州某抖音代运营公司坑了
 前端三剑客——CSS
原文地址：https://blog.csdn.net/qq_43814415/article/details/134401368