实战场景:对Categorical类型字段数据统计,Categorical类型是Pandas拥有的一种特殊数据类型,这样的类型可以包含基于整数的类别展示和编码的数据
马上安排!
- import pandas as pd
- #读取csv文件
- df = pd.read_csv("Telco-Customer-Churn.csv")
-
- # 填充 TotalCharges 的缺失值
- median = df["TotalCharges"][df["TotalCharges"] != ' '].median()
- df.loc[df["TotalCharges"] == ' ', 'TotalCharges'] = median
- df["TotalCharges"] = df["TotalCharges"].astype(float)
-
- # 将分类列转换成 Categorical 类型
- number_columns = ['tenure', 'MonthlyCharges', 'TotalCharges']
- for column in number_columns: df[column] = df[column].astype(float) #对三列变成float类型
- for column in set(df.columns) - set(number_columns): df[column] = pd.Categorical(df[column])
- print(df.info())
-
-
- print(df.describe(include=["category"]))
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null category
1 gender 7043 non-null category
2 SeniorCitizen 7043 non-null category
3 Partner 7043 non-null category
4 Dependents 7043 non-null category
5 tenure 7043 non-null float64
6 PhoneService 7043 non-null category
7 MultipleLines 7043 non-null category
8 InternetService 7043 non-null category
9 OnlineSecurity 7043 non-null category
10 OnlineBackup 7043 non-null category
11 DeviceProtection 7043 non-null category
12 TechSupport 7043 non-null category
13 StreamingTV 7043 non-null category
14 StreamingMovies 7043 non-null category
15 Contract 7043 non-null category
16 PaperlessBilling 7043 non-null category
17 PaymentMethod 7043 non-null category
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null float64
20 Churn 7043 non-null category
dtypes: category(18), float64(3)
memory usage: 611.1 KB
None
customerID gender SeniorCitizen Partner ... Contract PaperlessBilling PaymentMethod Churn
count 7043 7043 7043 7043 ... 7043 7043 7043 7043
unique 7043 2 2 2 ... 3 2 4 2
top 0002-ORFBO Male 0 No ... Month-to-month Yes Electronic check No
freq 1 3555 5901 3641 ... 3875 4171 2365 5174
[4 rows x 18 columns]
菜鸟实战,持续学习!