• 【概率论与数理统计(研究生课程)】知识点总结9(回归分析)


    原文地址:【概率论与数理统计(研究生课程)】知识点总结9(回归分析)

    一元线性回归模型

    y = β 0 + β 1 x + ϵ , ϵ ∼ N ( μ , σ 2 ) E ( ϵ ) = 0 , D ( ϵ ) = σ 2 > 0 ⟹ E ( y ) = β 0 + β 1 x y=β0+β1x+ϵ,ϵN(μ,σ2)E(ϵ)=0,D(ϵ)=σ2>0E(y)=β0+β1x

    y=β0+β1x+ϵ,ϵN(μ,σ2)E(ϵ)=0,D(ϵ)=σ2>0E(y)=β0+β1x

    回归方程: y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x

    推导过程:
    y i − E ( y i ) = y i − ( β 0 + β 1 x i ) Q ( β 1 , β 2 ) = ∑ i = 1 n ( y i − E ( y i ) ) 2 = ∑ i = 1 n ( y i − β 0 − β 1 x i ) 2 make  ∂ Q ( β 0 , β 1 ) ∂ β 0 = − 2 ∑ i = 1 n ( y i − β 0 − β 1 x i ) = 0 make  ∂ Q ( β 0 , β 1 ) ∂ β 1 = − 2 ∑ i = 1 n x i ( y i − β 0 − β 1 x i ) = 0 yiE(yi)=yi(β0+β1xi)Q(β1,β2)=ni=1(yiE(yi))2=ni=1(yiβ0β1xi)2make Q(β0,β1)β0=2ni=1(yiβ0β1xi)=0make Q(β0,β1)β1=2ni=1xi(yiβ0β1xi)=0

    yiE(yi)Q(β1,β2)make β0Q(β0,β1)make β1Q(β0,β1)=yi(β0+β1xi)=i=1n(yiE(yi))2=i=1n(yiβ0β1xi)2=2i=1n(yiβ0β1xi)=0=2i=1nxi(yiβ0β1xi)=0
    整理得到正规方程组:
    n β 0 ^ + n x ˉ β 1 ^ = n y ˉ ( 1 ) n x ˉ β 0 ^ + ( ∑ i = 1 n x i 2 ) β 1 ^ = ∑ i = 1 n x i y i ( 2 ) n^β0+nˉx^β1=nˉy(1)nˉx^β0+(ni=1x2i)^β1=ni=1xiyi(2)
    nβ0^+nxˉβ1^=nyˉ(1)nxˉβ0^+(i=1nxi2)β1^=i=1nxiyi(2)

    解上述方程组得到:
    β 1 ^ = L x y L x x β 0 ^ = y ˉ − β 1 ^ x ˉ L x x = ∑ i = 1 n ( x i − x ˉ ) 2 = ∑ i = 1 n x i 2 − n x ˉ 2 = ∑ i = 1 n x i 2 − 1 n ( ∑ i = 1 n x i ) 2 L y y = ∑ i = 1 n ( y i − y ˉ ) 2 = ∑ i = 1 n y i 2 − n y ˉ 2 = ∑ i = 1 n y i 2 − 1 n ( ∑ i = 1 n y i ) 2 L x y = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) = ∑ i = 1 n x i y i − n x ˉ y ˉ = ∑ i = 1 n x i y i − 1 n ∑ i = 1 n x i ∑ i = 1 n y i ^β1=LxyLxx^β0=ˉy^β1ˉxLxx=ni=1(xiˉx)2=ni=1x2inˉx2=ni=1x2i1n(ni=1xi)2Lyy=ni=1(yiˉy)2=ni=1y2inˉy2=ni=1y2i1n(ni=1yi)2Lxy=ni=1(xiˉx)(yiˉy)=ni=1xiyinˉxˉy=ni=1xiyi1nni=1xini=1yi
    β1^=LxxLxyβ0^=yˉβ1^xˉLxx=i=1n(xixˉ)2=i=1nxi2nxˉ2=i=1nxi2n1(i=1nxi)2Lyy=i=1n(yiyˉ)2=i=1nyi2nyˉ2=i=1nyi2n1(i=1nyi)2Lxy=i=1n(xixˉ)(yiyˉ)=i=1nxiyinxˉyˉ=i=1nxiyin1i=1nxii=1nyi

    如果题目中给了 ∑ \sum 形式的数据, L x x , L y y , L x y L_{xx},L_{yy},L_{xy} Lxx,Lyy,Lxy一般用上述公式最右边的方式来求。

    残差/剩余平方和

    Q e = ∑ i = 1 n e i 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = ∑ i = 1 n ( y i − β 0 ^ − β 1 ^ x i ) 2 = L y y − β 1 ^ L x y = L y y − L x y 2 L x x Q_e=\sum\limits_{i=1}^{n}e_i^2=\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2=\sum\limits_{i=1}^{n}(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2=L_{yy}-\hat{\beta_1}L_{xy}=L_{yy}-\frac{L_{xy}^2}{L_{xx}} Qe=i=1nei2=i=1n(yiyi^)2=i=1n(yiβ0^β1^xi)2=Lyyβ1^Lxy=LyyLxxLxy2

    定理: Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2)
    E ( Q e σ 2 ) = n − 2 ⟹ E ( Q e n − 2 ) = σ 2 ⟹ σ 2 ^ = Q e n − 2 E(Qeσ2)=n2E(Qen2)=σ2^σ2=Qen2

    E(σ2Qe)=n2E(n2Qe)=σ2σ2^=n2Qe
    σ ^ 2 \hat{\sigma}^2 σ^2的无偏估计为 Q e n − 2 \frac{Q_e}{n-2} n2Qe

    最小二乘估计量的性质

    β 0 , β 1 \beta_0,\beta_1 β0,β1的最小二乘估计量都是无偏的: E ( β 0 ^ ) = β 0 , E ( β 1 ^ ) = β 1 E(\hat{\beta_0})=\beta_0,\quad E(\hat{\beta_1})=\beta_1 E(β0^)=β0,E(β1^)=β1

    β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

    β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

    C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

    y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)

    回归方程显著性检验(t、F、r)

    1. 提出原假设和备择假设(回归方程是否显著,反映在斜率是否为0):

    H 0 : β 1 = 0 ; H 1 : β 1 ≠ 0 H_0: \beta_1=0; \quad H_1:\beta_1\neq0 H0:β1=0;H1:β1=0

    1. 选取统计量:
      β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) → H 0 β 1 ^ L x x σ ∼ N ( 0 , 1 ) ^β1N(β1,σ2Lxx)^β1β1σ2LxxN(0,1)H0^β1LxxσN(0,1)

      H0 β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σβ1^Lxx N(0,1)
      若需构造 t t t检验,还需要一个 χ 2 \chi^2 χ2分布,而 Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2),从而:
      T = β 1 ^ L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 β 1 ^ L x x σ ^ ∼ t ( n − 2 ) T=\frac{\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\sigma}}{\sqrt{\frac{Q_e}{\sigma^2}/(n-2)}}\xrightarrow{\hat{\sigma^2}=\frac{Q_e}{n-2}}\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat\sigma} \sim t(n-2) T=σ2Qe/(n2) σβ1^Lxx σ2^=n2Qe σ^β1^Lxx t(n2)
      若使用 F F F检验,需要计算回归平方和以及残差平方和:
      S R 2 = ∑ i = 1 n ( y i ^ − y i ˉ ) 2 = β 1 ^ L x y S e 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = S T 2 − S R 2 = L y y − β 1 ^ L x y S R 2 σ 2 ∼ χ 2 ( 1 ) , S e 2 σ 2 ∼ χ 2 ( n − 2 ) F = S R 2 σ 2 / 1 S e 2 σ 2 / ( n − 2 ) = ( n − 2 ) S R 2 S e 2 ∼ F ( 1 , n − 2 ) S2R=ni=1(^yi¯yi)2=^β1LxyS2e=ni=1(yi^yi)2=S2TS2R=Lyy^β1LxyS2Rσ2χ2(1),S2eσ2χ2(n2)F=S2Rσ2/1S2eσ2/(n2)=(n2)S2RS2eF(1,n2)
      SR2=i=1n(yi^yiˉ)2=β1^LxySe2=i=1n(yiyi^)2=ST2SR2=Lyyβ1^Lxyσ2SR2χ2(1),σ2Se2χ2(n2)F=σ2Se2/(n2)σ2SR2/1=Se2(n2)SR2F(1,n2)

    2. 拒绝域

      t t t检验拒绝域: ∣ T ∣ = ∣ β 1 ^ L x x σ ^ ∣ ≥ t α 2 ( n − 2 ) |T|=|\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat{\sigma}}|\ge t_{\frac{\alpha}{2}}(n-2) T=σ^β1^Lxx t2α(n2)

      F F F检验拒绝域: F ≥ F α ( 1 , n − 2 ) F\ge F_\alpha(1,n-2) FFα(1,n2)

    3. 确定 t α 2 ( n − 2 ) o r F α ( 1 , n − 2 ) t_{\frac{\alpha}{2}(n-2)}\quad or \quad F_{\alpha}(1,n-2) t2α(n2)orFα(1,n2)

    4. 计算 ∣ T ∣ o r F |T|\quad or\quad F TorF

    5. 判断结果

    回归系数的区间估计

    β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) ⟹ ( β 1 ^ − β 1 ) L x x σ ∼ N ( 0 , 1 ) T = ( β 1 ^ − β 1 ) L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 ( β 1 ^ − β 1 ) L x x σ ^ ∼ t ( n − 2 ) ^β1N(β1,σ2Lxx)^β1β1σ2LxxN(0,1)(^β1β1)LxxσN(0,1)T=(^β1β1)LxxσQeσ2/(n2)^σ2=Qen2(^β1β1)Lxxˆσt(n2)

    T=σ2Qe/(n2) σ(β1^β1)Lxx β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σ(β1^β1)Lxx N(0,1)σ2^=n2Qe σ^(β1^β1)Lxx t(n2)

    β 1 \beta_1 β1置信水平为 1 − α 1-\alpha 1α的置信区间为: ( β 1 ^ ± σ ^ L x x t α 2 ( n − 2 ) ) (\hat{\beta_1}\pm \frac{\hat{\sigma}}{\sqrt{L_{xx}}}t_{\frac{\alpha}{2}}(n-2)) (β1^±Lxx σ^t2α(n2))

    估计

    设回归方程为 y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x,对任意给定的 x = x 0 x=x_0 x=x0 y 0 y_0 y0的均值 E ( y 0 ) = β 0 + β 1 x 0 E(y_0)=\beta_0+\beta_1 x_0 E(y0)=β0+β1x0 E ( y 0 ) E(y_0) E(y0)的无偏估计为 y 0 ^ = β 0 ^ + β 1 ^ x 0 \hat{y_0}=\hat{\beta_0}+\hat{\beta_1}x_0 y0^=β0^+β1^x0

    β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

    β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

    C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

    D ( y 0 ^ ) = D ( β 0 ^ ) + D ( β 1 ^ x 0 ) + 2 C o v ( β 0 ^ , β 1 ^ x 0 ) = ( 1 n + ( x ˉ − x 0 ) 2 L x x ) σ 2 D(\hat{y_0})=D(\hat{\beta_0})+D(\hat{\beta_1}x_0)+2Cov(\hat{\beta_0},\hat{\beta_1}x_0)=(\frac{1}{n}+\frac{(\bar{x}-x_0)^2}{L_{xx}})\sigma^2 D(y0^)=D(β0^)+D(β1^x0)+2Cov(β0^,β1^x0)=(n1+Lxx(xˉx0)2)σ2

    y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)
    于是 E ( y 0 ) E(y_0) E(y0)的置信度为 1 − α 1-\alpha 1α的置信区间为:
    ( y 0 ^ − δ 0 , y 0 ^ + δ 0 ) , δ = t α 2 ( n − 2 ) σ ^ 1 n + ( x 0 − x ˉ ) 2 L x x (\hat{y_0}-\delta_0,\hat{y_0}+\delta_0),\delta=t_{\frac{\alpha}{2}}(n-2)\hat{\sigma}\sqrt{\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}} (y0^δ0,y0^+δ0),δ=t2α(n2)σ^n1+Lxx(x0xˉ)2

    区间预测

    y 0 − y 0 ^ ∼ N ( 0 , [ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ] σ 2 ) U = y 0 − y 0 ^ σ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ N ( 0 , 1 ) T = y 0 − y 0 ^ σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ t ( n − 2 ) y0^y0N(0,[1+1n+(x0ˉx)2Lxx]σ2)U=y0^y0σ1+1n+(x0ˉx)2LxxN(0,1)T=y0^y0ˆσ1+1n+(x0ˉx)2Lxxt(n2)

    y0y0^N(0,[1+n1+Lxx(x0xˉ)2]σ2)U=σ1+n1+Lxx(x0xˉ)2 y0y0^N(0,1)T=σ^1+n1+Lxx(x0xˉ)2 y0y0^t(n2)

    因此, y 0 y_0 y0的置信度为 1 − α 1-\alpha 1α的区间为
    ( y 0 ^ − δ , y 0 ^ + δ ) , δ = t α 2 ( n − 2 ) σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x (\hat{y_0}-\delta,\hat{y_0}+\delta),\delta=t_{\frac{\alpha}{2}}(n-2)\hat{\sigma}\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}} (y0^δ,y0^+δ),δ=t2α(n2)σ^1+n1+Lxx(x0xˉ)2

    可线性化的一元非线性回归

    image20221022153739659.png
    image20221022153756446.png
    image20221022153810181.png

  • 相关阅读:
    基于GPIO子系统编写LED驱动
    npm与Maven:前端与后端构建工具深度对比学习
    倾向得分匹配PSM案例分析
    rainbond 如何切换源码构建所需的builder镜像以及runner镜像拉取地址
    如何在2023年学习React
    Centos7安装自动化运维Ansible
    <哈希及模拟实现>——《C++高阶》
    一道桥牌明手题的思路与分析
    为什么要构建垂直切片
    兼容国产化神通数据库遇到的问题适配
  • 原文地址:https://blog.csdn.net/weixin_46334596/article/details/127464134