• Pandas读取json文件



    Pandas对于Json文件操作的方法:

    1. 将 JSON 字符串转换为 pandas 对象。
    read_json([path_or_buf, orient, typ, dtype, ...])
    
    • 1
    1. Normalize semi-structured JSON data into a flat table.
    json_normalize(data[, record_path, meta, ...])
    
    • 1
    1. 将对象转换为 JSON 字符串。
    DataFrame.to_json([path_or_buf, orient, ...])
    
    • 1
    1. Create a Table schema from data.
    build_table_schema(data[, index, ...])
    
    • 1

    pandas.read_json

    pandas.read_json(path_or_buf=None, 
    				orient=None, 
    				typ='frame', 
    				dtype=None, 
    				convert_axes=None, 
    				convert_dates=True, 
    				keep_default_dates=True, 
    				numpy=False, 
    				precise_float=False, 
    				date_unit=None, 
    				encoding=None, 
    				encoding_errors='strict', 
    				lines=False, 
    				chunksize=None, 
    				compression='infer', 
    				nrows=None, 
    				storage_options=None)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    参数:

    • path_or_buf:a valid JSON str, path object or file-like object
    • orient:str
    • typ:{‘frame’, ‘series’}, default ‘frame’
    • dtype:bool or dict, default None
    • convert_axes:bool, default None
    • convert_dates:bool or list of str, default True
    • keep_default_dates:bool, default True
    • numpy:bool, default False
    • precise_float:bool, default False
    • date_unit:str, default None
    • encoding:str, default is ‘utf-8’
    • encoding_errors:str, optional, default “strict”
    • lines:bool, default False。按行读取
    • chunksize:int, optional
    • compression:str or dict, default ‘infer’
    • nrows:int, optional
    • storage_options:dict, optional

    返回值: Series or DataFrame
    示例:
    json文件内容:

    [{"ttery":"[123]","issue":"20130801-3391"},{"ttery":"[123]","issue":"20130801-3390"},{"ttery":"[123]","issue":"20130801-3389"}]
    
    • 1
    # -*- coding: utf-8 -*-
    
    import pandas as pd
    
    file = open('ceshi.json', 'r', encoding='utf-8')
    
    df = pd.read_json(file, orient='records')
    df.to_excel('pandas处理ceshi-json.xlsx', index=False, columns=["ttery", "issue"])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    pandas.json_normalize

    pandas.json_normalize(data, 
    				record_path=None, 
    				meta=None, 
    				meta_prefix=None, 
    				record_prefix=None, 
    				errors='raise', 
    				sep='.', 
    				max_level=None)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    参数:

    • data:dict or list of dicts
    • record_path:str or list of str, default None
    • meta:list of paths (str or list of str), default None
    • meta_prefix:str, default None
    • record_prefix:str, default None
    • errors:{‘raise’, ‘ignore’}, default ‘raise’
    • sep:str, default ‘.’
    • max_level:int, default None

    返回值: frame:DataFrame
    示例:

    data = [
        {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
        {"name": {"given": "Mark", "family": "Regner"}},
        {"id": 2, "name": "Faye Raker"},
    ]
    pd.json_normalize(data)
    
    id name.first name.last name.given name.family        name
    0  1.0     Coleen      Volk        NaN         NaN         NaN
    1  NaN        NaN       NaN       Mark      Regner         NaN
    2  2.0        NaN       NaN        NaN         NaN  Faye Raker
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    data = [
        {
            "id": 1,
            "name": "Cole Volk",
            "fitness": {"height": 130, "weight": 60},
        },
        {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
        {
            "id": 2,
            "name": "Faye Raker",
            "fitness": {"height": 130, "weight": 60},
        },
    ]
    pd.json_normalize(data, max_level=0)
    
    id        name                        fitness
    0  1.0   Cole Volk  {'height': 130, 'weight': 60}
    1  NaN    Mark Reg  {'height': 130, 'weight': 60}
    2  2.0  Faye Raker  {'height': 130, 'weight': 60}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    data = [
        {
            "id": 1,
            "name": "Cole Volk",
            "fitness": {"height": 130, "weight": 60},
        },
        {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
        {
            "id": 2,
            "name": "Faye Raker",
            "fitness": {"height": 130, "weight": 60},
        },
    ]
    pd.json_normalize(data, max_level=1)
    
    id        name  fitness.height  fitness.weight
    0  1.0   Cole Volk             130              60
    1  NaN    Mark Reg             130              60
    2  2.0  Faye Raker             130              60
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    data = [
        {
            "state": "Florida",
            "shortname": "FL",
            "info": {"governor": "Rick Scott"},
            "counties": [
                {"name": "Dade", "population": 12345},
                {"name": "Broward", "population": 40000},
                {"name": "Palm Beach", "population": 60000},
            ],
        },
        {
            "state": "Ohio",
            "shortname": "OH",
            "info": {"governor": "John Kasich"},
            "counties": [
                {"name": "Summit", "population": 1234},
                {"name": "Cuyahoga", "population": 1337},
            ],
        },
    ]
    result = pd.json_normalize(
        data, "counties", ["state", "shortname", ["info", "governor"]]
    )
    
    name  population    state shortname info.governor
    0        Dade       12345   Florida    FL    Rick Scott
    1     Broward       40000   Florida    FL    Rick Scott
    2  Palm Beach       60000   Florida    FL    Rick Scott
    3      Summit        1234   Ohio       OH    John Kasich
    4    Cuyahoga        1337   Ohio       OH    John Kasich
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    DataFrame.to_json

    DataFrame.to_json(path_or_buf=None, 
    				orient=None, 
    				date_format=None, 
    				double_precision=10, 
    				force_ascii=True, 
    				date_unit='ms', 
    				default_handler=None, 
    				lines=False, 
    				compression='infer', 
    				index=True, 
    				indent=None, 
    				storage_options=None)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    参数:

    • path_or_buf:str, path object, file-like object, or None, default None
    • orient:str
    • date_format:{None, ‘epoch’, ‘iso’}
    • double_precision:int, default 10
    • force_ascii:bool, default True
    • date_unit:str, default ‘ms’ (milliseconds)
    • default_handler:callable, default None
    • lines:bool, default False
    • compression:str or dict, default ‘infer’
    • index:bool, default True
    • indent:int, optional
    • storage_options:dict, optional

    返回值: None or str
    示例:

    import json
    df = pd.DataFrame(
        [["a", "b"], ["c", "d"]],
        index=["row 1", "row 2"],
        columns=["col 1", "col 2"],
    )
    result = df.to_json(orient="split")
    parsed = json.loads(result)
    json.dumps(parsed, indent=4)  
    
    {
        "columns": [
            "col 1",
            "col 2"
        ],
        "index": [
            "row 1",
            "row 2"
        ],
        "data": [
            [
                "a",
                "b"
            ],
            [
                "c",
                "d"
            ]
        ]
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30

    pandas.io.json.build_table_schema

    pandas.io.json.build_table_schema(data, index=True, primary_key=None, version=True)
    
    • 1

    参数:

    • data:Series,DataFrame
    • index:bool, default True
    • primary_key:bool or None, default True
    • version:bool, default True

    返回值: schema:dict
    示例:

    df = pd.DataFrame(
        {'A': [1, 2, 3],
         'B': ['a', 'b', 'c'],
         'C': pd.date_range('2016-01-01', freq='d', periods=3),
        }, index=pd.Index(range(3), name='idx'))
    build_table_schema(df)
    
    {'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'A', 'type': 'integer'}, {'name': 'B', 'type': 'string'}, {'name': 'C', 'type': 'datetime'}], 'primaryKey': ['idx'], 'pandas_version': '1.4.0'}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
  • 相关阅读:
    今年阿里云双十一服务器优惠价格讨论_看看大家怎么说?
    关于四元数归一化
    小孩近视用白炽灯好吗?使用护眼台灯有啥好处?
    中英文说明书丨 AbFluor 488 细胞凋亡检测试剂盒
    设计模式之装饰模式(学习笔记)
    打印lua输出日志
    高性能渲染——详解Html Canvas的优势与性能
    【AI视野·今日NLP 自然语言处理论文速览 四十九期】Fri, 6 Oct 2023
    python爬山算法求函数值
    web前端期末大作业——贵州山地旅游介绍网页1页 HTML旅游网站设计与实现
  • 原文地址:https://blog.csdn.net/weixin_43956958/article/details/125922030