• 数据挖掘与分析课程笔记(Chapter 20)


    数据挖掘与分析课程笔记

    • 参考教材:Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.

    文章目录

    1. 数据挖掘与分析课程笔记(目录)
    2. 数据挖掘与分析课程笔记(Chapter 1)
    3. 数据挖掘与分析课程笔记(Chapter 2)
    4. 数据挖掘与分析课程笔记(Chapter 5)
    5. 数据挖掘与分析课程笔记(Chapter 7)
    6. 数据挖掘与分析课程笔记(Chapter 14)
    7. 数据挖掘与分析课程笔记(Chapter 15)
    8. 数据挖掘与分析课程笔记(Chapter 20)
    9. 数据挖掘与分析课程笔记(Chapter 21)


    Chapter 20: Linear Discriminant Analysis

    Set up D = { ( x i , y i ) } i = 1 n \mathbf{D}=\{(\mathbf{x}_i,y_i) \}_{i=1}^n D={(xi,yi)}i=1n, 其中 y i = 1 , 2 y_i=1,2 yi=1,2(或 ± 1 \pm 1 ±1 等), D 1 = { x i ∣ y i = 1 } \mathbf{D}_1=\{\mathbf{x}_i|y_i=1 \} D1={xiyi=1} D 2 = { x i ∣ y i = 2 } \mathbf{D}_2=\{\mathbf{x}_i|y_i=2 \} D2={xiyi=2}

    Goal:寻找向量 w ∈ R d \mathbf{w}\in \mathbb{R}^d wRd (代表直线方向)使得 D 1 , D 2 \mathbf{D}_1,\mathbf{D}_2 D1,D2 的“平均值”距离最大且“总方差”最小。

    20.1 Normal LDA

    w ∈ R d , w T w = 1 \mathbf{w} \in \mathbb{R}^d,\mathbf{w}^T\mathbf{w}=1 wRd,wTw=1,则 x i \mathbf{x}_i xi w \mathbf{w} w 方向上的投影为 x i ′ = ( w T x i w T u ) w = a i w , a i = w T x i \mathbf{x}_{i}^{\prime}=\left(\frac{\mathbf{w}^{T} \mathbf{x}_{i}}{\mathbf{w}^{T} \mathbf{u}}\right) \mathbf{w}=a_{i} \mathbf{w},a_{i}=\mathbf{w}^{T} \mathbf{x}_{i} xi=(wTuwTxi)w=aiw,ai=wTxi

    D 1 \mathbf{D}_1 D1 中数据在 w \mathbf{w} w 上的投影平均值为:( ∣ D 1 ∣ = n 1 |\mathbf{D}_1|=n_1 D1=n1
    m 1 : = 1 n 1 ∑ x i ∈ D 1 a i = μ 1 T w m_1:=\frac{1}{n_1}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_1}a_i=\boldsymbol{\mu}_1^T\mathbf{w} m1:=n11xiD1ai=μ1Tw
    投影平均值等于平均值的投影。

    类似地: D 2 \mathbf{D}_2 D2 中数据在 w \mathbf{w} w 上的投影平均值为:
    m 2 : = 1 n 2 ∑ x i ∈ D 2 a i = μ 2 T w m_2:=\frac{1}{n_2}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_2}a_i=\boldsymbol{\mu}_2^T\mathbf{w} m2:=n21xiD2ai=μ2Tw
    目标之一:寻找 w \mathbf{w} w 使得 ( m 1 − m 2 ) 2 (m_1-m_2)^2 (m1m2)2 最大。

    对于 D i \mathbf{D}_i Di,定义:
    s i 2 = ∑ x k ∈ D i ( a k − m i ) 2 s_i^2=\sum\limits_{\mathbf{x}_k\in \mathbf{D}_i}(a_k-m_i)^2 si2=xkDi(akmi)2
    注意: s i 2 = n i σ i 2   ( ∣ D i ∣ = n i ) s_i^2=n_i\sigma^2_i\ (|D_i|=n_i) si2=niσi2 (Di=ni)

    Goal:Fisher LDA目标函数:
    max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2} wmaxJ(w)=s12+s22(m1m2)2
    注意: J ( w ) = J ( w 1 , w 2 , ⋯   , w d ) J(\mathbf{w})=J(w_1,w_2,\cdots,w_d) J(w)=J(w1,w2,,wd)
    ( m 1 − m 2 ) 2 = ( w T ( μ 1 − μ 2 ) ) 2 = w T ( ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T ) w = w T B w

    (m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw" role="presentation">(m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw
    (m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw

    B \mathbf{B} B 被称为类间扩散矩阵

    s 1 2 = ∑ x i ∈ D 1 ( a i − m 1 ) 2 = ∑ x i ∈ D 1 ( w T x i − w T μ 1 ) 2 = ∑ x i ∈ D 1 ( w T ( x i − μ 1 ) ) 2 = w T ( ∑ x i ∈ D 1 ( x i − μ 1 ) ( x i − μ 1 ) T ) w = w T S 1 w

    s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w" role="presentation">s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w
    s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w

    S 1 \mathbf{S}_{1} S1 被称为 D 1 \mathbf{D}_1 D1 的扩散矩阵 S 1 = n 1 Σ 1 \mathbf{S}_{1}=n_1\Sigma_1 S1=n1Σ1

    类似地, s 2 2 = w T S 2 w s_{2}^{2}=\mathbf{w}^{T} \mathbf{S}_{2} \mathbf{w} s22=wTS2w

    S = S 1 + S 2 \mathbf{S}=\mathbf{S}_{1}+\mathbf{S}_{2} S=S1+S2,则
    max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B w w T S w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSwwTBw

    注意:
    d d w J ( w ) = 2 B w ( w T S w ) − 2 S w ( w T B w ) ( w T S w ) 2 = 0 \frac{d}{d\mathbf{w}}J(\mathbf{w})=\frac{2\mathbf{B}\mathbf{w}(\mathbf{w}^T\mathbf{S}\mathbf{w})-2\mathbf{S}\mathbf{w}(\mathbf{w}^T\mathbf{B}\mathbf{w})}{(\mathbf{w}^T\mathbf{S}\mathbf{w})^2}=\mathbf{0} dwdJ(w)=(wTSw)22Bw(wTSw)2Sw(wTBw)=0
    即有:
    B w ( w T S w ) = S w ( w T B w ) B w = S w ⋅ w T B w w T S w B w = J ( w ) ⋅ S w ( ∗ )

    Bw(wTSw)=Sw(wTBw)Bw=SwwTBwwTSwBw=J(w)Sw()" role="presentation">Bw(wTSw)=Sw(wTBw)Bw=SwwTBwwTSwBw=J(w)Sw()
    Bw(wTSw)BwBw=Sw(wTBw)=SwwTSwwTBw=J(w)Sw()
    S − 1 \mathbf{S}^{-1} S1 存在,则
    S − 1 B w = J ( w ) ⋅ w \mathbf{S}^{-1}\mathbf{B}\mathbf{w}=J(\mathbf{w})\cdot\mathbf{w} S1Bw=J(w)w
    故要求最大 J ( w ) J(\mathbf{w}) J(w) ,只需 S − 1 B \mathbf{S}^{-1}\mathbf{B} S1B 的最大特征值, w \mathbf{w} w 为其特征向量。

    ☆ 不求特征向量求出 w \mathbf{w} w 的方法

    B = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T \mathbf{B}=(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T} B=(μ1μ2)(μ1μ2)T 代入 ( ∗ ) (*) ()
    ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T w = J ( w ) ⋅ S w S − 1 ( μ 1 − μ 2 ) [ ( μ 1 − μ 2 ) T w J ( w ) ] = w

    (μ1μ2)(μ1μ2)Tw=J(w)SwS1(μ1μ2)[(μ1μ2)TwJ(w)]=w" role="presentation">(μ1μ2)(μ1μ2)Tw=J(w)SwS1(μ1μ2)[(μ1μ2)TwJ(w)]=w
    (μ1μ2)(μ1μ2)TwS1(μ1μ2)[J(w)(μ1μ2)Tw]=J(w)Sw=w
    故只需计算 S − 1 ( μ 1 − μ 2 ) \mathbf{S}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}) S1(μ1μ2),再单位化。

    20.2 Kernel LDA:

    事实1:如果 ( S ϕ − 1 B ϕ ) w = λ w \left(\mathbf{S}_{\phi}^{-1} \mathbf{B}_{\phi}\right) \mathbf{w}=\lambda \mathbf{w} (Sϕ1Bϕ)w=λw,那么 w = ∑ j = 1 n a j ϕ ( x j ) \mathbf{w}=\sum\limits_{j=1}^na_j\phi(\mathbf{x}_j) w=j=1najϕ(xj),证明见讲稿最后两页。

    a = ( a 1 , ⋯   , a n ) T \mathbf{a}=(a_1,\cdots,a_n)^T a=(a1,,an)T 是“事实1”中的向量。

    下面将 max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B ϕ w w T S ϕ w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B}_{\phi} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S}_{\phi} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSϕwwTBϕw 的问题转化为 max ⁡ G ( a ) \max G(\mathbf{a}) maxG(a) s.t. 使用 K \mathbf{K} K 能求解。

    注意到:
    m i = w T μ i ϕ = ( ∑ j = 1 n a j ϕ ( x j ) ) T ( 1 n i ∑ x i ∈ D i ϕ ( x k ) ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j ϕ ( x j ) T ϕ ( x k ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j K ( x j , x k ) = a T m i

    mi=wTμiϕ=(j=1najϕ(xj))T(1nixiDiϕ(xk))=1nij=1nxkDiajϕ(xj)Tϕ(xk)=1nij=1nxkDiajK(xj,xk)=aTmi" role="presentation">mi=wTμiϕ=(j=1najϕ(xj))T(1nixiDiϕ(xk))=1nij=1nxkDiajϕ(xj)Tϕ(xk)=1nij=1nxkDiajK(xj,xk)=aTmi
    mi=wTμiϕ=(j=1najϕ(xj))T(ni1xiDiϕ(xk))=ni1j=1nxkDiajϕ(xj)Tϕ(xk)=ni1j=1nxkDiajK(xj,xk)=aTmi
    其中,
    m i = 1 n i ( ∑ x k ∈ D i K ( x 1 , x k ) ∑ x k ∈ D i K ( x 2 , x k ) ⋮ ∑ x k ∈ D i K ( x n , x k ) ) n × 1 \mathbf{m}_{i}=\frac{1}{n_{i}}\left(
    xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk)" role="presentation">xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk)
    \right)_{n\times 1}
    mi=ni1 xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk) n×1


    ( m 1 − m 2 ) 2 = ( w T μ 1 ϕ − w T μ 2 ϕ ) 2 = ( a T m 1 − a T m 2 ) 2 = a T ( m 1 − m 2 ) ( m 1 − m 2 ) T a = a T M a
    (m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa" role="presentation">(m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa
    (m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa

    M \mathbf{M} M 被称为核类间扩散矩阵)
    s 1 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) − w T μ 1 ϕ ∥ 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) ∥ 2 − 2 ∑ x i ∈ D 1 w T ϕ ( x i ) ⋅ w T μ 1 ϕ + ∑ x i ∈ D 1 ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 ∥ ∑ j = 1 n a j ϕ ( x j ) T ϕ ( x i ) ∥ 2 ) − 2 ⋅ n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 + n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 a T K i K i T a ) − n 1 ⋅ a T m 1 m 1 T a = a T ( ( ∑ x i ∈ D 1 K i K i T ) − n 1 m 1 m 1 T ) a = a T N 1 a
    s12=xiD1wTϕ(xi)wTμ1ϕ2=xiD1wTϕ(xi)22xiD1wTϕ(xi)wTμ1ϕ+xiD1wTμ1ϕ2=(xiD1j=1najϕ(xj)Tϕ(xi)2)2n1wTμ1ϕ2+n1wTμ1ϕ2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a" role="presentation">s12=xiD1wTϕ(xi)wTμ1ϕ2=xiD1wTϕ(xi)22xiD1wTϕ(xi)wTμ1ϕ+xiD1wTμ1ϕ2=(xiD1j=1najϕ(xj)Tϕ(xi)2)2n1wTμ1ϕ2+n1wTμ1ϕ2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a
    s12=xiD1 wTϕ(xi)wTμ1ϕ 2=xiD1 wTϕ(xi) 22xiD1wTϕ(xi)wTμ1ϕ+xiD1 wTμ1ϕ 2= xiD1 j=1najϕ(xj)Tϕ(xi) 2 2n1 wTμ1ϕ 2+n1 wTμ1ϕ 2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a

    类似地,令 N 2 = ( ∑ x i ∈ D 2 K i K i T − n 2 m 2 m 2 T ) \mathbf{N}_2=\left(\sum\limits_{\mathbf{x}_{i} \in \mathbf{D}_{2}} \mathbf{K}_{i} \mathbf{K}_{i}^{T}-n_{2} \mathbf{m}_{2} \mathbf{m}_{2}^{T}\right) N2=(xiD2KiKiTn2m2m2T)

    s 1 2 + s 2 2 = a T ( N 1 + N 2 ) a = a T N a s_1^2+s_2^2=\mathbf{a}^{T} (\mathbf{N}_{1}+\mathbf{N}_{2}) \mathbf{a}=\mathbf{a}^{T}\mathbf{N} \mathbf{a} s12+s22=aT(N1+N2)a=aTNa

    故: J ( w ) = a T M a a T N a : = G ( a ) J(\mathbf{w})=\frac{\mathbf{a}^{T}\mathbf{M} \mathbf{a}}{\mathbf{a}^{T}\mathbf{N} \mathbf{a}}:=G(\mathbf{a}) J(w)=aTNaaTMa:=G(a)

    类似 20.1, M a = λ N a \mathbf{M} \mathbf{a}=\lambda\mathbf{N} \mathbf{a} Ma=λNa

    • N − 1 \mathbf{N} ^{-1} N1 存在, N − 1 M a = λ a \mathbf{N}^{-1} \mathbf{M} \mathbf{a}=\lambda \mathbf{a} N1Ma=λa λ \lambda λ N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的最大特征值, a \mathbf{a} a 是相应的特征向量。

    • N − 1 \mathbf{N} ^{-1} N1 不存在,MATLAB 求广义逆

    最后考查 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1,即

    ( ∑ j = 1 n a j ϕ ( x j ) ) T ( ∑ i = 1 n a i ϕ ( x i ) ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i ϕ ( x j ) T ϕ ( x i ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i K ( x i , x j ) = 1 a T K a = 1

    (j=1najϕ(xj))T(i=1naiϕ(xi))=1j=1ni=1najaiϕ(xj)Tϕ(xi)=1j=1ni=1najaiK(xi,xj)=1aTKa=1" role="presentation">(j=1najϕ(xj))T(i=1naiϕ(xi))=1j=1ni=1najaiϕ(xj)Tϕ(xi)=1j=1ni=1najaiK(xi,xj)=1aTKa=1
    (j=1najϕ(xj))T(i=1naiϕ(xi))j=1ni=1najaiϕ(xj)Tϕ(xi)j=1ni=1najaiK(xi,xj)aTKa=1=1=1=1
    求出 N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的特征向量 a \mathbf{a} a 后, a ← a a T K a \mathbf{a}\leftarrow \frac{\mathbf{a}}{\sqrt{\mathbf{a}^T\mathbf{K}\mathbf{a}}} aaTKa a 以保证 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1


  • 相关阅读:
    阿里云有奖体验:用PolarDB-X搭建一个高可用系统
    浏览器黑暗模式插件
    Python学习路线
    Flutter折腾学习成长记(25)
    pytorch中Tensor(张量)
    【ijkplayer】解码流程梳理(三)
    汽车驾驶3D模拟仿真展示系统更立体直观
    DPU — 功能特性 — 网络系统的硬件卸载
    2021-07-09 springboot 整合shiro(前后端分离解决方案)
    MySQL - 数据库的监控方式
  • 原文地址:https://blog.csdn.net/yyywxk/article/details/127671900