• [Pandas技巧] 多行合并成一行


    7b3984bdb037493097b9439e063bdbac.png
     美图欣赏2022/09/16

    在最近的需求开发中,有如下需求需要进行修改,数据源demo如下所示

    251907c927734d1b941fc2d4e386ee0e.png

    根据字段'material'进行分组,对字段'site'进行合并,内容之间用逗号(,)分隔,再进行去重处理;对字段'LT'取最大值,最终呈现结果如下所示

    1ce115b0909c44eab34246e1b8c65714.png

    具体实现代码如下所示

    1. import pandas as pd
    2. df = pd.DataFrame([['FJZ','A123',123],
    3. ['FOC','A123',456],
    4. ['FJZ','B456',112],
    5. ['FJZ','B456',245],
    6. ['FJZ','B456',110],
    7. ['FOC','C789',202],
    8. ['FOC','C789',205]
    9. ],columns=['site','material','LT'])
    10. # 筛选字段'material'并进行去重处理
    11. merge_data = df[['material']]
    12. merge_data = merge_data.drop_duplicates(subset = ['material'])
    13. # 对'site'和'LT'字段进行处理
    14. df = df.groupby(['material']).agg({'site':[','.join],'LT':max})
    15. # 更换字段栏位名称
    16. new_column = ['site','LT']
    17. df.columns = new_column
    18. # 对字段'site'中的值进行去重处理
    19. def data_deduplication(row):
    20. data_list = row['site'].split(',')
    21. res = ','.join(set(data_list))
    22. return res
    23. df['site'] = df.apply(lambda row:data_deduplication(row), axis=1)
    24. merge_data = pd.merge(merge_data, df, how='left', on=['material'])
    25. # 调整字段顺序
    26. order = ['site','material','LT']
    27. merge_data = merge_data[order]

    df(未进行合并处理) 

    b9e6599dadad49cea82560fbdd394d5a.png

    merge_data 

    b96efd480abf493b9cd35eb6ddd8152f.png

    将上述核心关键代码进行拆分讲解

    1. import pandas as pd
    2. df = pd.DataFrame([['FJZ','A123',123],
    3. ['FOC','A123',456],
    4. ['FJZ','B456',112],
    5. ['FJZ','B456',245],
    6. ['FJZ','B456',110],
    7. ['FOC','C789',202],
    8. ['FOC','C789',205]
    9. ],columns=['site','material','LT'])

    b9e6599dadad49cea82560fbdd394d5a.png

    1. # 对'site'和'LT'字段进行处理
    2. df = df.groupby(['material']).agg({'site':[','.join],'LT':max})

    该段代码可将多行数据合并成一行数据 

    0ae3d2a267c847a090fbcb9007977e58.png

    从上图可以看出df中的字段名发生了变化,我们需要对此进行更换字段名操作,便于后续的理解以及数据处理

    1. # 更换字段栏位名称
    2. new_column = ['site','LT']
    3. df.columns = new_column

    1c4c29fc4f7d4cfa89dbba58fc1d73da.png

    另一种解决方案(更优)

    1. import pandas as pd
    2. df = pd.DataFrame([['FJZ','A123',123],
    3. ['FOC','A123',456],
    4. ['FJZ','B456',112],
    5. ['FJZ','B456',245],
    6. ['FJZ','B456',110],
    7. ['FOC','C789',202],
    8. ['FOC','C789',205]
    9. ],columns=['site','material','LT'])
    10. df_copy = df.copy()
    11. # 筛选字段'material'并进行去重处理
    12. merge_data = df_copy[['material']]
    13. merge_data = merge_data.drop_duplicates(subset = ['material'])
    14. # 定义拼接函数,并对字段进行去重
    15. def concat_func(row):
    16. return pd.Series({
    17. 'site':','.join(map(str,row['site'].unique()))
    18. })
    19. # 对'site'和'LT'字段进行处理
    20. df_copy = df_copy.groupby(df_copy['material']).apply(lambda row:concat_func(row))
    21. merge_data = pd.merge(merge_data, df_copy, how='left', on=['material'])
    22. df_copy = df.copy()
    23. df_copy = df_copy.groupby(df['material']).agg({'LT':max})
    24. merge_data = pd.merge(merge_data, df_copy, how='left', on=['material'])
    25. # 调整字段顺序
    26. order = ['site','material','LT']
    27. merge_data = merge_data[order]

    扩展补充多行合并成一行操作案例

    案例1

    问题:根据字段'id'进行分组,对字段'project'进行合并,内容之间用逗号(,)分隔

    1. import pandas as pd
    2. data = pd.DataFrame({'id':[1,1,2,2],'project':['A','B','C','D']})
    3. #合并数据
    4. merge_data = data.groupby(['id'])['project'].apply(list).to_frame()
    5. merge_data['project'] = merge_data['project'].apply(lambda x:str(x).replace('[','').replace(']',''))

    合并前

    50c44e44e8c044fa992d378ce722a011.png

    合并后

    8e7b12e42d7b41298577de89bbbb91af.png

    案例2

    问题:把多行数据按“姓名”合并,并保留所有信息 

    数据源

    1ac1eed7fac34d8e94b9a0071970a9fd.png

    经过合并后的数据

    0fc9066148eb42649765cfe3d26aa444.png

    具体代码如下所示 

    1. import pandas as pd
    2. df = pd.DataFrame([['Odin','电影','男'],
    3. ['Odin','旅游','男'],
    4. ['Odin','音乐','男'],
    5. ['Lee','篮球','女'],
    6. ['Lee','插花','女'],
    7. ['Lee','瑜伽','女'],
    8. ['Andy','足球','男'],
    9. ['Andy','乒乓球','男'],
    10. ['Summer','阅读','女'],
    11. ['Summer','音乐','女'],
    12. ],columns=['name','hobby','gender'])
    13. # 定义拼接函数,并对字段进行去重
    14. def concat_func(row):
    15. return pd.Series({
    16. 'hobby':','.join(row['hobby'].unique()),
    17. 'gender':','.join(row['gender'].unique())
    18. })
    19. result = df.groupby(df['name']).apply(lambda row:concat_func(row)).reset_index()

    案例3

    问题: 把多行数据按“material”合并,并保留所有信息,其中'site'字段进行去重处理,'usages'字段不进行去重处理

    数据源

    dc91d396cc784ebea5f97aa51b0e49e9.png

    经过合并后的数据

    2ae4a9b807244fc1a50f4e78921150e3.png

    1. import pandas as pd
    2. df = pd.DataFrame([['FJZ','A123',1],
    3. ['FOC','A123',1],
    4. ['FJZ','B456',1],
    5. ['FJZ','B456',2],
    6. ['FJZ','B456',2],
    7. ['FOC','C789',3],
    8. ['FOC','C789',5]
    9. ],columns=['site','material','usages'])
    10. order = ['site','material','usages']
    11. data_new = df[order]
    12. # 定义拼接函数,并对字段进行去重
    13. def concat_func(row):
    14. return pd.Series({
    15. 'site':','.join(map(str,row['site'].unique())),
    16. 'usages':','.join(map(str,row['usages']))
    17. })
    18. data_new = data_new.groupby(data_new['material']).apply(lambda row:concat_func(row)).reset_index()
    19. # 调整字段栏位顺序
    20. order = ['site','material','usages']
    21. data_new = data_new[order]
  • 相关阅读:
    【计算机网络笔记】TCP的拥塞控制机制
    智能制造数字工厂 规划建设方案
    cookie、session、Token究竟区别在哪?如何进行身份认证,保持用户登录状态?
    监督学习和非监督学习, 半监督学习和增强学习
    leetcode日记(32)字符串相乘
    【服务器】ASUS ESC4000-E11 安装系统
    第7章-查找
    【C语言】指针的定义、概念和运用
    Jmeter v5.6.x 使用说明书(简要版)
    云计算考试题
  • 原文地址:https://blog.csdn.net/Hudas/article/details/126891082