Numpy入门[18]——数组读写

参考：

https://ailearning.apachecn.org/

使用Jupyter进行练习

import numpy as np
1

空格（制表符）分割的文本

假设有这样的一个空白分割的文件：

%%writefile myfile.txt
2.1 2.3 3.2 1.3 3.1
6.1 3.1 4.2 2.3 1.8
1
2
3

Writing myfile.txt
1

为了生成数组，先将数据转化成一个列表组成的列表，再将这个列表转换为数组：

data = []
with open('myfile.txt') as f:
    # 每次读一行
    for  line in f:
        fileds = line.split()
        row_data = [float(x) for x in fileds]
        data.append(row_data)
data = np.array(data)
data
1
2
3
4
5
6
7
8
9

array([[2.1, 2.3, 3.2, 1.3, 3.1],
       [6.1, 3.1, 4.2, 2.3, 1.8]])
1
2

更简便的是使用 loadtxt 方法：

data = np.loadtxt('myfile.txt')
data
1
2

array([[2.1, 2.3, 3.2, 1.3, 3.1],
       [6.1, 3.1, 4.2, 2.3, 1.8]])
1
2

逗号分隔文件

%%writefile myfile.txt
2.1, 2.3, 3.2, 1.3, 3.1
6.1, 3.1, 4.2, 2.3, 1.8
1
2
3

Overwriting myfile.txt
1

对于逗号分隔的文件（通常为.csv格式）,我可以稍微修改之前繁琐的过程，将 split 的参数变成 ','即可。

不过，loadtxt 函数也可以读这样的文件，只需要制定分割符的参数即可：

data = np.loadtxt('myfile.txt', delimiter=',')
data
1
2

array([[2.1, 2.3, 3.2, 1.3, 3.1],
       [6.1, 3.1, 4.2, 2.3, 1.8]])
1
2

loadtxt函数

loadtxt(fname, dtype=<type 'float'>, 
        comments='#', delimiter=None, 
        converters=None, skiprows=0, 
        usecols=None, unpack=False, ndmin=0) 
1
2
3
4

delimiter 就是刚才用到的分隔符参数。

skiprows 参数表示忽略开头的行数，可以用来读写含有标题的文本

%%writefile myfile.txt
X Y Z MAG ANG
2.1 2.3 3.2 1.3 3.1
6.1 3.1 4.2 2.3 1.8
1
2
3
4

Overwriting myfile.txt
1

np.loadtxt('myfile.txt', skiprows=1)
1

array([[2.1, 2.3, 3.2, 1.3, 3.1],
       [6.1, 3.1, 4.2, 2.3, 1.8]])
1
2

此外，有一个功能更为全面的 genfromtxt 函数，能处理更多的情况，但相应的速度和效率会慢一些。

genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, 
           skiprows=0, skip_header=0, skip_footer=0, converters=None, 
           missing='', missing_values=None, filling_values=None, usecols=None, 
           names=None, excludelist=None, deletechars=None, replace_space='_', 
           autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, 
           usemask=False, loose=True, invalid_raise=True)

1
2
3
4
5
6
7

loadtxt的更多特性

%%writefile myfile.txt
 -- BEGINNING OF THE FILE
% Day, Month, Year, Skip, Power
01, 01, 2000, x876, 13 % wow!
% we don't want have Jan 03rd
04, 01, 2000, xfed, 55
1
2
3
4
5
6

Overwriting myfile.txt
1

data = np.loadtxt('myfile.txt', 
                  skiprows=1,        #忽略第一行
                  dtype=np.int64,    #数组类型
                  delimiter=',',     #逗号分割
                  usecols=(0,1,2,4), #指定使用哪几列数据
                  comments='%'       #百分号为注释符
                 )
data
1
2
3
4
5
6
7
8

array([[   1,    1, 2000,   13],
       [   4,    1, 2000,   55]], dtype=int64)
1
2

loadtxt自定义转换方法

%%writefile myfile.txt
2010-01-01 2.3 3.2
2011-01-01 6.1 3.1
1
2
3

Overwriting myfile.txt
1

假设我们的文本包含日期，我们可以使用 datetime 在 loadtxt 中处理：

import datetime

def date_converter(s):
    return datetime.datetime.strptime(s.decode('ascii'), "%Y-%m-%d")

data = np.loadtxt('myfile.txt',
                  dtype=object, #数据类型为对象
                  converters={0:date_converter,  #第一列使用自定义转换方法
                              1:float,           #第二第三使用浮点数转换
                              2:float})

data
1
2
3
4
5
6
7
8
9
10
11
12

array([[datetime.datetime(2010, 1, 1, 0, 0), 2.3, 3.2],
       [datetime.datetime(2011, 1, 1, 0, 0), 6.1, 3.1]], dtype=object)
1
2

import os
os.remove('myfile.txt')
1
2

读写各种格式的文件

如下表所示：

文件格式	使用的包	函数
txt	numpy	loadtxt, genfromtxt, fromfile, savetxt, tofile
csv	csv	reader, writer
Matlab	scipy.io	loadmat, savemat
hdf	pytables, h5py
NetCDF	netCDF4, scipy.io.netcdf	netCDF4.Dataset, scipy.io.netcdf.netcdf_file
文件格式	使用的包	备注
wav	scipy.io.wavfile	音频文件
jpeg,png,…	PIL, scipy.misc.pilutil	图像文件
fits	pyfits	天文图像

将数组写入文件

savetxt 可以将数组写入文件，参数如下：

savetxt(fname, 
        X, 
        fmt='%.18e', 
        delimiter=' ', 
        newline='\n', 
        header='', 
        footer='', 
        comments='# ')
1
2
3
4
5
6
7
8

示例：

data = np.array([[1,2], 
                 [3,4]])
def output(file):
    with open(file) as f:
        for line in f:
            print(line)
np.savetxt('out1.txt', data)
print("默认输出")
output('out1.txt')
np.savetxt('out2.txt', data, fmt="%d") #保存为整数
print('保存为整数')
output('out2.txt')
np.savetxt('out3.txt', data, fmt="%.2f", delimiter=',') #保存为2位小数的浮点数，用逗号分隔
print('保存为2位小数的浮点数，用逗号分隔')
output('out3.txt')

data2 = np.array([[1+1j,2], 
                 [3,4]])

np.savetxt('out4.txt', data2, fmt="%.2f", delimiter=',') #保存为2位小数的浮点数，用逗号分隔
print('输出复数')
output('out4.txt')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

默认输出
1.000000000000000000e+00 2.000000000000000000e+00

3.000000000000000000e+00 4.000000000000000000e+00

保存为整数
1 2

3 4

保存为2位小数的浮点数，用逗号分隔
1.00,2.00

3.00,4.00

输出复数
 (1.00+1.00j), (2.00+0.00j)

 (3.00+0.00j), (4.00+0.00j)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

移除生成的文件。

import os
os.remove('out1.txt')
os.remove('out2.txt')
os.remove('out3.txt')
os.remove('out4.txt')
1
2
3
4
5

Numpy二进制格式

数组可以储存成二进制格式，单个的数组保存为 .npy 格式，多个数组保存为多个.npy文件组成的 .npz 格式，每个 .npy 文件包含一个数组。

与文本格式不同，二进制格式保存了数组的 shape, dtype 信息，以便完全重构出保存的数组。

保存的方法：

save(file, arr) 保存单个数组，.npy 格式
savez(file, *args, **kwds) 保存多个数组，无压缩的 .npz 格式
savez_compressed(file, *args, **kwds) 保存多个数组，有压缩的 .npz 格式

读取的方法：

load(file, mmap_mode=None) 对于 .npy，返回保存的数组，对于 .npz，返回一个名称-数组对组成的字典。

单个数组的读写

a = np.array([[1.0,2.0], [3.0,4.0]])

fname = 'afile.npy'
np.save(fname, a)
aa = np.load(fname)
aa
1
2
3
4
5
6

array([[1., 2.],
       [3., 4.]])
1
2

# 删除
os.remove('afile.npy')
1
2

二进制与文本大小比较

a = np.arange(10000.)
np.savetxt('a.txt',a)
print('文本文件大小：',os.stat('a.txt').st_size)
np.save('a.npy', a)
print('二进制文件大小：',os.stat('a.npy').st_size)
# 删除生成的文件
os.remove('a.npy')
os.remove('a.txt')
1
2
3
4
5
6
7
8

文本文件大小： 260000
二进制文件大小： 80128
1
2

可以看到，二进制文件大约是文本文件的三分之一。

保存多个数组

a = np.array([[1.0,2.0], 
              [3.0,4.0]])
b = np.arange(1000)
np.savez('data.npz', a=a, b=b)
# 查看包含的文件
!unzip -l data.npz

1
2
3
4
5
6
7

Archive:  data.npz
  Length      Date    Time    Name
---------  ---------- -----   ----
      160  1980/01/01 00:00   a.npy
     4128  1980/01/01 00:00   b.npy
---------                     -------
     4288                     2 files
1
2
3
4
5
6
7

# 载入数据
data = np.load('data.npz')
# 载入后可以像字典一样进行操作
# data['a']
data['b'].shape
1
2
3
4
5

(1000,)
1

压缩文件

print('当数据比较整齐时')
a = np.arange(20000.)
np.savez('a.npz', a=a)
print('文件压缩前大小：',os.stat('a.npz').st_size)
np.savez_compressed('a2.npz', a=a)
print('文件压缩后大小：',os.stat('a2.npz').st_size)
print('压缩倍数：',os.stat('a.npz').st_size/os.stat('a2.npz').st_size)

1
2
3
4
5
6
7
8

当数据比较整齐时
文件压缩前大小： 160256
文件压缩后大小： 26908
压缩倍数： 5.955700906793519
1
2
3
4

print('当数据比较混乱时')
a = np.random.rand(20000)
np.savez('a.npz', a=a)
print('文件压缩前大小：',os.stat('a.npz').st_size)
np.savez_compressed('a2.npz', a=a)
print('文件压缩后大小：',os.stat('a2.npz').st_size)
print('压缩倍数：',os.stat('a.npz').st_size/os.stat('a2.npz').st_size)
1
2
3
4
5
6
7

当数据比较混乱时
文件压缩前大小： 160256
文件压缩后大小： 151109
压缩倍数： 1.0605324633211788
1
2
3
4

os.remove('a.npz')
os.remove('a2.npz')
1
2

相关阅读:
SpringBoot 如何解决跨域问题
 一文搞定Postman（菜鸟必看）
分享几个小技巧教你图片怎么加边框
 typescript学习(二)
Poco库使用:操作Json格式数据
 章节三：RASA Domain介绍
 芯片后端的APR是指什么？
【Docker】docker安装nacos
（2）达梦数据库匹配
 猿创征文｜HCIE-Security Day54：anti-ddos设备防御原理
原文地址：https://blog.csdn.net/weixin_47692652/article/details/128181879