• Python读取复杂电子表格(CSV)数据小技巧一则


    关于CSV格式

    逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。“CSV”并不是一种单一的、定义明确的格式(尽管RFC 4180有一个被通常使用的定义)。

    python中csv模块中定义的函数:

    csv.reader(csvfile, dialect=‘excel’, **fmtparams)

    返回一个可以遍历csv文件的reader对象。dialect参数可以用于定义一组特定的csv方言参数,是Dialect类的子类或者list_dialects()函数返回的字符串。从csv文件读取的每一行都作为字符串列表返回。除非指定了QUOTE_NONNUMERIC格式选项(在这种情况下,未加引号的字段将转换为浮点数),否则不会执行自动的数据类型转换

    待处理CSV文件

    此文件是外部接口提供的文件,由于时间是比较久远的软件,或者,其他原因,内容有些散乱,如下图所示:
    在这里插入图片描述
    示例数据如下:

    "","","","","","","","","","","","","","","","","油品销售明细表","","","","","","","","","","","",""
    "","","加油站名称:","","","","广州*********加油站        ","","","","","","","","","","","","","","","","","","","","","",""
    "","从:","","2021-10-01 00:00:00","","","","","","","到:","2022-10-01 23:59:59","","","","","","","","","","","","","","","","",""
    "","流水号","","","","","","交易时间","","","","","油枪号码","","油品名称","","","油品单价","","体积","","交易金额","","起泵码","","止泵码","","","备注"
    "","","","","479171","","","","2021-10-23 16:21:00","","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","58.83","","479.49","","380693.19","","","380752.02",""
    "","","","","","259635","","","","2021-10-23 16:32:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","60.90","","493.00","","380752.02","","","380812.92",""
    "","","","","","259636","","","","2021-10-23 16:34:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","56.19","","446.95","","380812.92","","","380869.11",""
    "","","","","","479251","","","","2021-10-23 18:30:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","70.76","","573.94","","380869.11","","","380939.87",""
    "","","","","","86765","","","","2021-10-23 18:35:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","44.03","","361.49","","380939.87","","","380983.90",""
    "","","","","479289","","","","","2021-10-23 20:11:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","6.09","","50.00","","380983.90","","","380989.99",""
    "","","","","","86775","","","","2021-10-23 20:30:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","49.13","","393.53","","380989.99","","","381039.12",""
    "","","","","479309","","","","","2021-10-23 21:23:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","24.36","","200.00","","381039.12","","","381063.48",""
    "","","","","","479413","","","","2021-10-24 03:29:00","","","","1","","95号 车用汽油(ⅥA)","","","8.21","","33.47","","271.29","","381063.48","","","381096.95",""
    "打印时间:2022-10-28","","","","","","","","","","","","","","","","","","","","","","","","","","填表人:","",""
    "","流水号","","","交易时间","","油枪号码","","油品名称","","油品单价","","体积","","交易金额","","起泵码","","止泵码","","","备注"
    "","","","86814","","2021-10-24 09:47:00","","1","","95号 车用汽油(ⅥA)","","8.21","","52.29","","429.30","","381157.85","","","381210.14",""
    "","","259822","","","2021-10-24 09:59:00","","1","","95号 车用汽油(ⅥA)","","8.21","","46.09","","374.90","","381210.14","","","381256.23",""
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    python使用csv模块解析数据

    方法一,是按单元格逐行个性化解析,例如参考上次XLS格式数据处理《Python按单元格读取复杂电子表格(Excel)数据实践》,这个方法,挺麻烦的,发现第二个方法后,过段放弃此方法。

    方法二,提取有效数据解析,由于CSV格式数据不跨行,可以逐行剔除空项,而直接取有效数据,代码非常简单,如下所示:

    import csv
    import pandas as pd
    
    # 以读方式打开文件
    dat_row = []
    with open("油品销售明细202110-202210.CSV", mode="r") as f:    
        # 基于打开的文件,创建csv.reader实例
        reader = csv.reader(f)
        
        # 逐行获取数据,并输出
        for row in reader:
            dat_col = [v for v in row if len(v)>0]
            
            n = n + 1
            if len(dat_col)==9:
                dat_row.append(dat_col)
                     
    cols_list = ['流水号', '交易时间', '油枪号码', '油品名称', '油品单价', '体积', '交易金额', '起泵码', '止泵码']
    df = pd.DataFrame(dat_row,columns=cols_list)
    df.to_csv('detail.csv',encoding='utf_8_sig',index=False)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    注:其中,“dat_col = [v for v in row if len(v)>0]”代码是按行,过滤没有数据的单元格。

    小结

    对于没有合并单元格(此处为跨行)的数据文件解析,使用适当的方法还是很简单的,非常喜欢简单的方法!

    参考:

    快乐江小鱼. Python基础 - csv文件格式. CSDN博客. 2022.08
    肖永威. 《Python按单元格读取复杂电子表格(Excel)数据实践》. CSDN博客. 2022.11

  • 相关阅读:
    普及组算法汇总
    若依启动run-modules-system.bat报错问题解决方案
    Nginx 实用配置技巧,99%用过的是老司机
    Java如何绑定线程到指定CPU上执行?
    电脑如何截屏?一起来揭晓答案!
    2023-09-15力扣每日一题
    ubuntu/Linux连接redis教程
    Vim功能大纲
    java毕业设计会议室预约管理系统(附源码、数据库)
    第十三届蓝桥杯c++b组-积木画
  • 原文地址:https://blog.csdn.net/xiaoyw/article/details/128090841