pandas基础笔记01|joyful Pandas学习

pandas基础笔记01|joyful Pandas学习
目录
- 一、python基础
  
  常见的随机生成函数
  生成服从a到b均匀分布
  生成服从方差为 $\sigma^2$ 均值为 $\mu$ 的一元正态分布
  
  np数组
  转置 T
  合并操作 r_,c_
  
  维度变换reshape
  数组的切片与索引 start:end:step
  
  常用函数
  where
  nonzero,argmax,argmin（返回的是索引）
  any ,all(判断非零和True)
  cumprod(累乘),cumsum(累加),diff(做差)
  
  统计函数
  统计函数略过缺失值的做法,使用nan*类型的函数
  协方差和相关系数(cov和corrcoef)
  广播机制
  标量和数组的操作
  二维数组（m,n）之间的操作（m,1）,(1,n)
  一维数组和二维数组的操作
  
  向量与矩阵的计算
  向量内积 dot
  向量范数和矩阵范数np.linalg.norm
  矩阵乘法 a@b
  
  Ex1 li用列表推导式写矩阵乘法
  EX2更新矩阵
  EX3卡方统计量
  EX4改进矩阵计算的性能
一、python基础
1. 列表推导式与条件赋值
```
def my_func(x):
	return 2*x

[my_func(i) for i in range(5)]
#两层循环
[m+'_'+n for m in ['a','b'] for n in ['c','d']]
1
2
3
4
5
6
```
1. 匿名函数与map方法
  通过map函数，返回一个map对象，通过list转为列表
```
my_func = lambda x:2*x
[(lambda x:2*x)(i) for i in range(5)]
list(map(lambda x:2*x,range(5)))
#多个对象的使用方法
list(map(lambda x,y:str(x)+'_'+y,range(5),list('abcde')))

1
2
3
4
5
6
```
1. zip对象与enumerate方法
- zip可以将多个可迭代对象打包成一个元组构成的可迭代对象，返回一个zip对象，通过tuple，list可以得到相应的打包结果
```
L1,L2,L3=list('abc'),list('def'),list('hij')
list(zip(L1,L2,L3))
tuple(zip(L1,L2,L3))
#Out[19]: [('a', 'd', 'h'), ('b', 'e', 'i'), ('c', 'f', 'j')]
for i,j,k in zip(L1,L2,L3):
	print(i,j,k)
L=list('abcd')

1
2
3
4
5
6
7
8
```
- enumerate是一种特殊的打包，可以在迭代的时候绑定迭代元素的遍历序号
```
#enmerate方法
for index,value in enumerate(L):
	print(index,value)
#zip方法
for index,value in zip(range(len(L)),L):
	print(index,value)

#两个列表建立字典映射
dict(zip(L1,L2))
1
2
3
4
5
6
7
8
9
```
Python提供了*操作符号和zip联合进行解压操作
```
zipped = list(zip(L1,L2,L3))
#out[('a', 'd', 'h'), ('b', 'e', 'i'), ('c', 'f', 'j')]
#解压元素
list(zip(*zipped))
#out[('a', 'b', 'c'), ('d', 'e', 'f'), ('h', 'i', 'j')]

1
2
3
4
5
6
```
常见的随机生成函数

随机矩阵：np.random
0-1均匀分布:np.random.rand
标准正态随机数组:np.random.randn
随机整数组:np.random.randint
随机列表抽样:np.random.choice
```
np.random.rand(3)#生成服从0-1均匀分布的3个随机数
#array([0.61056105, 0.07458226, 0.3922192 ])
np.random.rand(3,3)
#生成3x3矩阵
1
2
3
4
```
```
#生成(0,1)标准正态分布
np.random.randn(3)
np.random.randn(3,3)
1
2
3
```
```
#指定概率的有放回抽样(默认方式),抽两个
my_list=['a','b','c','d']
np.random.choice(my_list,2,replace=False,p=[0.1,0.7,0.1,0.7])
#抽取之后以矩阵
np.random.choice(my_list,(3,3))
1
2
3
4
5
```
```
np.random.permutation(my_list)
#返回元素的个数与原列表相同的时候
#不放回抽样相当于使用permutation函数
#permutation排列
1
2
3
4
```
```
#随机种子，能够固定随机数的输出结果
np.random.seed(0)
np.random.rand()
1
2
3
```
生成服从a到b均匀分布
```
#方法一 使用零一分布
a=5,b=15
(b-a)*np.random.rand(3)+a
#方法二 使用库函数
np.random.uniform(5,15,3)
1
2
3
4
5
```
生成服从方差为 $\sigma^2$ 均值为 $\mu$ 的一元正态分布
```
#通过标准正态分布生成
sigma,mu=2.5,3
mu+np.random.randn(3)*sigma
#通过库函数
np.random.normal(3,2.5,3)

1
2
3
4
5
6
```
np数组

转置 T
```
np.zeros((2,3)).T
1
```
合并操作 r_,c_
```
np.r_[np.zeros((2,3)),np.zeros((2,3))]
#上下合并
np.c_[np.zeros((2,3)),np.zeros((2,3))]
#左右合并
1
2
3
4
```
```
try:
	np.r_[np.array([0,0]),np.zeros((2,1))]
except Exception as e:
	Err_Msg=e

1
2
3
4
5
```
维度变换reshape
```
#np.arange(8)默认起点0，步长1
target=np.arange(8).reshape(2,4)
#按照行读取和填充
target.reshape((4,2),order='C')
#按照列读取和填充
target.reshape((4,2),order='F')
#将n*1的数组转为1维数组
target=np.one((3,1))
target.reshape(-1)
1
2
3
4
5
6
7
8
9
```
被调用数组的大小是确定的允许有一个维度存在空缺，此时填充-1即可

数组的切片与索引 start🔚step
```
target = np.arange(9).reshape(3,3)
#array([[0, 1, 2],
#       [3, 4, 5],
#       [6, 7, 8]])
target[:-1, [0,2]]
#Out: array([[0, 2],[3, 5]])

#np.ix_布尔索引
target[np.ix_([True,False,True],[True,False,True])]
#array([[0,2],[6,8]])
target[np.ix_([1,2],[True,False,True])]
target.reshape(-1)

1
2
3
4
5
6
7
8
9
10
11
12
13
```
*当数组维度为1维时，可以直接进行布尔索引，而无需 np.ix_ *

常用函数

where

nonzero,argmax,argmin（返回的是索引）

nonzero返回非零数的索引
argmax返回最大数的索引
argmin返回最小数的索引
```
a=np.array([-2,-5,0,1,3,-1])
np.nonzero(a)
a.argmax()
a.argmin()
1
2
3
4
```
any ,all(判断非零和True)

any 指当序列至少存在一个 True 或非零元素时返回 True ，否则返回 False

all 指当序列元素全为 True 或非零元素时返回 True ，否则返回 False
```
a.any()
a.all() 
1
2
```
cumprod(累乘),cumsum(累加),diff(做差)
```
a = np.array([1,2,3])
a.cumprod()
a.cumsum()
np.diff(a)
1
2
3
4
```
统计函数

max, min, mean, median, std, var, sum, quantile
其中分位数计算是全局方法
```
target=np.arange(5)
target.max()
np.quantile(target,0.5)
1
2
3
```
numpy中的分位数
numpy.quantile(a, q, axis=None, out=None, overwrite_input=False, interpolation=‘linear’, keepdims=False)

pandas的quantile函数(分位数)
```
DataFrame.quantile(q=0.5,numeric_only=True,interpolation='linear')
q : 数字或者是类列表，范围只能在0-1之间，默认是0.5，即中位数-第2四分位数
axis :计算方向，可以是 {0, 1, ‘index’, ‘columns’}中之一，默认为 0
interpolation（插值方法）:可以是 {‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}之一，默认是linear。
import pandas as pd
data = pd.DataFrame({'num':[2,4,7,8,9,10]})
print(data['num'].quantile()) # 默认0.5时,方法为liner
print(data['num'].quantile(interpolation="higher"))
#计算多个分位数
data['num'].quantile([0.25,0.5,0.75])
#0.25    4.75
#0.50    7.50
#0.75    8.75

1
2
3
4
5
6
7
8
9
10
11
12
13
14
```
统计函数略过缺失值的做法,使用nan*类型的函数
```
target=np.array([1,2,np.nan])
target.max()#返回nan
#需要使用np.nanmax(target)
np.nanmax(target)
np.nanquantile(target,0.5)
1
2
3
4
5
```
协方差和相关系数(cov和corrcoef)

标准差:
$cov(X,Y)=\frac{\sum_{i=1}^n(X_i-\hat{X})(Y_i-\hat{Y})}{n-1}$
协方差：
$Cov(x,y)=\frac{\sum_{i=1}^{n}({x_i-\bar{x})^2}*\sum_{i=1}^{n}({y_i-\bar{y})^2}}{\sqrt{\sum_{i=1}^{n}({x_i-\bar{x})^2}}*\sqrt{\sum_{i=1}^{n}({y_i-\bar{y})^2}}}$

相关系数
$\rho_{XY}=\frac{Cov(X,Y)}{\sqrt{D(X)\sqrt{D(Y)}}}$
```
target1 = np.array([1,3,5,9])
target2 = np.array([1,5,3,-9])
np.cov(target1,target2)#协方差
np.corrcoef(target1,target2)#相关系数

target = np.arange(1,10).reshape(3,-1)
#axis=0（列）axis=1（行）
target.sum(0)#按列统计
target.sum(1)#按行统计
1
2
3
4
5
6
7
8
9
```
广播机制

标量和数组的操作
```
res=3*np.ones((2,2))+1
array([[4., 4.],
       [4., 4.]])

res=1/res
array([[0.25, 0.25],
       [0.25, 0.25]])
1
2
3
4
5
6
7
```
二维数组（m,n）之间的操作（m,1）,(1,n)

两个数组维度完全一致的时候，使用对应的元素的操作
```
res=np.ones((3,2))
res*np.array([[2,3]])
#第二个数组扩充第一维度为3
res*np.array([[2],[3],[4]])
#第二个数组扩充第二维度为2
res*np.array([[2]])
#等价于两次扩充，数组两个维度分别扩充为3和2

1
2
3
4
5
6
7
8
```
一维数组和二维数组的操作
```
np.ones(3)+np.ones((2,3))
np.ones(3)+np.ones((2,1))
np.ones(1)+np.ones((2,3))

1
2
3
4
```
向量与矩阵的计算

向量内积 dot
```
a=np.array([1,2,3])
b=np.array([1,3,5])
a.dot(b)
#向量内积
1
2
3
4
```
向量范数和矩阵范数np.linalg.norm
```
matrix_target=np.arange(4).reshape(-1,2)
matrix_target
np.linalg.norm(matrix_target,'fro')
#fro nuc inf -inf 0 -1 1 2 -2
np.linalg.norm(matrix_target,np.inf)
np.linalg.norm(matrix_target,2)

1
2
3
4
5
6
7
```
范数的性质:

矩阵乘法 a@b

$\rm [\mathbf{A}_{m\times p}\mathbf{B}_{p\times n}]_{ij} = \sum_{k=1}^p\mathbf{A}_{ik}\mathbf{B}_{kj}$
```
a=np.arange(4).reshape(-1,2)
b=np.arange(-4,0).reshape(-1,2)
#矩阵乘法
a@b
1
2
3
4
```
一些知识

np.empty()
#根据给定的维度和数值类型返回一个新的数组，其元素不进行初始化
#all函数判断一个元组或者列表中的元素是否都为真
#默认的是元组中不存在0，空字符none是都为True,其他的都为False

Ex1 li用列表推导式写矩阵乘法
```
M1=np.random.rand(2,3)
#生成零一分布的随机2x3矩阵
M2=np.random.rand(3,4)
res=np.empty(M1.shape[0],M2.shape[1])
for i in range(M1.shape[0]):
	for j in range(M2.shape[1]):
	 	item=0
	 	 for k in range(M1.shape[1]):
	 	 	 item+=M1[i][k]*M2[k][j]
	 	 res[i][j]=item                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
1
2
3
4
5
6
7
8
9
10
```
答案
```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

M1=np.random.rand(2,3)
#生成零一分布的随机2x3矩阵
M2=np.random.rand(3,4)
res=[[sum([M1[i][k]*M2[k][j]for k in range(M1.shape[1])]) for j in range(M2.shape[1])] for i in range(M1.shape[0])]

1
2
3
4
5
6
7
8
9
```
EX2更新矩阵

$A_{m\times n}$ 对A中的每个元素更新生成矩阵B，更新的方法是 $\displaystyle B_{ij}=A_{ij}\sum_{k=1}^n\frac{1}{A_{ik}}$

例如下面的矩阵为A,则 $B_{2,2}=5\times (\frac{1}{4}+\frac{1}{5}+\frac{1}{6})=\frac{37}{12}$

$\displaystyle$
$\begin{aligned} A = [\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{matrix}] \end{aligned}$
A=⎣ ⎡147258369⎦ ⎤
```
A=np.range(1,10).reshape(3,-1)
B=A*(1/A).sum(1).reshape(-1,1)

1
2
3
```
EX3卡方统计量

设矩阵 $A_{m\times n}$ ，记 $\rm B_{ij}=\frac{(\sum_{i=1}^nAij)\times(\sum_{j=1}^nAij)}{\sum_{i=1}^m\sum_{j=1}^nAij}$
卡方值
$\chi^2=\sum_{i=1}^m\sum_{j=1}^n\frac{(A_{ij}-B_{ij})^2}{B_{ij}}$
```
#定义矩阵A
np.random.seed(0)
A=np.random.randint(10,20,(8,5))
A.sum(0)*A.sum(1).reshape(-1,1)/A.sum()
res=((A-B)**2/B).sum()
1
2
3
4
5
```
EX4改进矩阵计算的性能

设 $Z$ 为 $m\times n$ 的矩阵， $B$ 和 $U$ 分别是 $m\times p$ 和 $\times n$ 的矩阵，$$
```
np.random.seed(0)
m,n,p=100,80,50
B=np.random.randint(0,2,(m,p))
1
2
3
```
相关阅读:
使用 jMeter 对 SAP Spartacus 进行并发性能测试
 硅麦驱动开发及调试（pdm＞＞I2S＞＞pcm）
PEG化哌唑嗪-量子点/PEG修饰红色InP/ZnS量子点/PEG/g-C3N4量子点复合荧光纳米微球
 汇编攻城记-LDR/STR/LDM/STM数据传输
 Go微服务: redis分布式锁
 语法基础(变量、输入输出、表达式与顺序语句)
unr #6day1 T2题解
 Spring Cloud Gateway 不小心换了个 Web 容器就不能用了，我 TM 人傻了
 Android RecyclerView 之吸顶效果
 MySQL数据库备份的三种方式
原文地址：https://blog.csdn.net/m0_52024881/article/details/125966468

目录

一、python基础

常见的随机生成函数

生成服从a到b均匀分布

生成服从方差为 σ 2 \sigma^2 σ2均值为 μ \mu μ的一元正态分布

np数组

转置 T

合并操作 r_,c_

维度变换reshape

数组的切片与索引 start🔚step

常用函数

where

nonzero,argmax,argmin（返回的是索引）

any ,all(判断非零和True)

cumprod(累乘),cumsum(累加),diff(做差)

统计函数

统计函数略过缺失值的做法,使用nan*类型的函数

协方差和相关系数(cov和corrcoef)

广播机制

标量和数组的操作

二维数组（m,n）之间的操作（m,1）,(1,n)

一维数组和二维数组的操作

向量与矩阵的计算

向量内积 dot

向量范数和矩阵范数np.linalg.norm

矩阵乘法 a@b

Ex1 li用列表推导式写矩阵乘法

EX2更新矩阵

EX3卡方统计量

EX4改进矩阵计算的性能

生成服从方差为 $\sigma^2$ 均值为 $\mu$ 的一元正态分布