本篇文章将会持续更新,记录在日常工作中,容易遇到的pandas库DataFrame中的常用操作。
data = [['Jack', 10], ['Tom', 12], ['Lucy', 13]]
columns = ['Name', 'Age']
df_by_list = pd.DataFrame(data, columns=columns)
print(df_by_list)
输出:
Name Age
0 Jack 10
1 Tom 12
2 Lucy 13
row = {
'Name': ['Jack', 'Tom', 'Lucy'],
'Age': [10, 12, 13]
}
df_by_dict = pd.DataFrame(row)
print(df_by_dict)
输出:
Name Age
0 Jack 10
1 Tome 12
2 Lucy 13
names = df['Name'].tolist()
print(names)
输出:
['Jack', 'Tom', 'Lucy']
ages = df[(df['Age'] > 10) & (df['Age'] < 13)]
print(ages)
输出:
Name Age
1 Tom 12
result = df.query('Age > 10 & Age < 13')
print(result)
输出:
Name Age
1 Tom 12
names = ['Tom', 'Lily', 'Sam']
result = df.query('Name not in @names')
print(result)
输出:
Name Age
0 Jack 10
2 Lucy 13
df['Gender'] = ['M', 'M', 'F']
print(df)
输出:
Name Age Gender
0 Jack 10 M
1 Tom 12 M
2 Lucy 13 F
df.insert(0, 'Gender', ['M', 'M', 'F'])
print(df)
输出:
Gender Name Age
0 M Jack 10
1 M Tom 12
2 F Lucy 13
df.loc[len(df.index)] = ('Lily', 20)
print(df)
输出:
Name Age
0 Jack 10
1 Tom 12
2 Lucy 13
3 Lily 20
注意: 如果不加在最后一行,数据将会被替换,例:
df.loc[1] = ('Lily', 20)
print(df)
输出:
Name Age
0 Jack 10
1 Lily 20
2 Lucy 13
data1 = [['Lily', 23], ['Sam', 35]]
columns1 = ['Name', 'Age']
df1 = pd.DataFrame(data1, columns=columns1)
df2 = pd.concat([df, df1], ignore_index=True)
print(df2)
输出:
Name Age
0 Jack 10
1 Tom 12
2 Lucy 13
3 Lily 23
4 Sam 35
注意:
1.ignore_index=True 参数表示重新设置索引
2.append方法即将过时,建议用concat方法
3.concat方法要求两个df需要有相同的列名
data1 = [['Lily', 23], ['Sam', 35]]
columns1 = ['Name', 'Age']
new_df = pd.DataFrame(data1, columns=columns1)
df.update(new_df)
print(df)
输出:
Name Age
0 Lily 23.0
1 Sam 35.0
2 Lucy 13.0
df.loc[0, 'Age'] = 25
print(df)
输出:
Name Age
0 Jack 25
1 Tom 12
2 Lucy 13
df = df.drop(df[(df['Age'] > 10) & (df['Age'] < 13)].index)
print(df)
输出:
Name Age
0 Jack 10
2 Lucy 13
df = df.drop('Age', axis=1)
print(df)
输出:
Name
0 Jack
1 Tom
2 Lucy
注意:
DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False)
for index, row in df.iterrows():
print(index)
print(row['Name'])
print(row['Age'])
输出:
0 Jack 10
1 Tom 12
2 Lucy 13
注意: 这里的iterrows()返回值为元组,(index,row),index即为行索引,row就是一行的所有数据,可通过字段名获取到