Pandas数据过滤的多种方式

DataFrame方式

import pandas as pd

df = pd.read_csv("data.csv", header=1, names=["index", "id", "url"])
df.dropna(how="any", inplace=True)

# DataFrame方式
newdf1 = df[(df.id == 4099963) | (df.id == 5181745)]
print("newdf1:", newdf1)

1
2
3
4
5
6
7
8
9

查询函数

# 查询函数
newdf2 = df.query("id == 4099963 | id == 5181745")
print("newdf2:", newdf2)
1
2
3

loc函数

# loc函数
newdf3 = df.loc[(df.id == 4099963) | (df.id == 5181745)]
print("newdf3:", newdf3)
1
2
3

iloc函数

# 行列筛选
newdf4 = df.iloc[1:5, :4]
print("newdf4:", newdf4)
1
2
3

非空

# 非空
newdf5 = df[df.url.notnull()]
1
2

字符串

# str
newdf6 = df[df.url.str.contains("https")]
print(newdf6)
1
2
3

apply

# apply
newdf7 = df[df.apply(lambda x: len(x["url"]) > 10, axis=1)]
print(newdf7)

1
2
3
4

参考

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
http://www.360doc.com/document/23/0624/20/82716111_1086091874.shtml

相关阅读:
GBase gcrcman 备份恢复工具
【Spring Cloud】如何确定微服务项目的Spring Boot、Spring Cloud、Spring Cloud Alibaba的版本
Kubernetes 1.25 正式发布，多方面重大突破
Spring Bean自动装配
nginx 记录每个IP连接的流量大小多少KB
动态规划学习4：5 最长回文子串三种方法
音视频开发：音频编码原理+采集+编码实战
通信原理学习笔记3-1：数字通信系统概述（模数转换、时频域采样定理）
多策略协同改进的阿基米德优化算法及其应用（Matlab代码实现）
chromadb 0.4.0 后的改动

原文地址：https://blog.csdn.net/lilongsy/article/details/134459320