dataframe列比较多,增加列的代码如下:
df=pd.DataFrame()
for i in range(1000):
vlist=[]
for j in range(1000):
vlist.append(j)
df['COL_' + str(i)] = vlist
df
警告错误:
/tmp/ipykernel_27622/2631638338.py:7: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling
frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, usenewframe = frame.copy()
df[‘COL_’ + str(i)] = vlist 就是insert ,提示碎片多,执行的时间长。
按提示,用pd.concat(axis=1) 增加列数据。
做一个中间的dataframe变量,通过pd.concat()将两个dataframe变量合并,赋值到df变量中,解决insert效率低,碎片多的警告错误。
df=pd.concat([df,frames], axis=1)
修改后代码如下:
df=pd.DataFrame()
for i in range(1000):
vlist=[]
for j in range(1000):
vlist.append(j)
frames = pd.DataFrame(pd.Series(vlist),columns=['COL_' + str(i)])
df=pd.concat([df,frames], axis=1)
df
运行速度快了不少,不再提示警告错误。