• Diffusion Models & CLIP


    Introduction to Diffusion Models

    生成模型
    主要指的是无监督学习中的生成模型,在无监督学习中的主要任务是让机器学习给定的样本,然后生成一些新的东西出来。比如:给机器看一些图片,能够生成一些新的图片出来,给机器读一些诗,然后能够自己写诗出来。

    请添加图片描述
    请添加图片描述
    请添加图片描述

    Forward Process

    Forward Process I

    Given q ( X 0 ) , q ( X t ∣ X t − 1 ) q\left(X_0\right), q\left(X_t \mid X_{t-1}\right) q(X0),q(XtXt1), calculate q ( X t ∣ X 0 ) q\left(X_t \mid X_0\right) q(XtX0)

    Forward step
    X t = α t X t − 1 + 1 − α t ϵ t X_t=\sqrt{\alpha_t} X_{t-1}+\sqrt{1-\alpha_t} \epsilon_t Xt=αt Xt1+1αt ϵt, where ϵ t ∼ N ( 0 , I ) \epsilon_t \sim N(0, I) ϵtN(0,I)
    Thus, q ( X t ∣ X t − 1 ) = N ( α t X t − 1 , ( 1 − α t ) I ) q\left(X_t \mid X_{t-1}\right)=N\left(\sqrt{\alpha_t} X_{t-1},\left(1-\alpha_t\right) I\right) q(XtXt1)=N(αt Xt1,(1αt)I)
    (你保证alpha都是(0,1)的就行)

    Forward Process II

    X t = α t X t − 1 + 1 − α t ϵ t = α t ( α t − 1 X t − 2 + 1 − α t − 1 ϵ t − 1 ) + 1 − α t ϵ t = α t α t − 1 X t − 2 + α t − α t α t − 1 ϵ t − 1 + 1 − α t ϵ t

    Xt=αtXt1+1αtϵt=αt(αt1Xt2+1αt1ϵt1)+1αtϵt=αtαt1Xt2+αtαtαt1ϵt1+1αtϵt" role="presentation" style="position: relative;">Xt=αtXt1+1αtϵt=αt(αt1Xt2+1αt1ϵt1)+1αtϵt=αtαt1Xt2+αtαtαt1ϵt1+1αtϵt
    Xt=αt Xt1+1αt ϵt=αt (αt1 Xt2+1αt1 ϵt1)+1αt ϵt=αtαt1 Xt2+αtαtαt1 ϵt1+1αt ϵt

    Fact: The sum of two normal distributions is still a normal distribution

    Therefore: α t − α t α t − 1 ϵ t − 1 + 1 − α t ϵ t ∼ N ( 0 , ( α t − α t α t − 1 + 1 − α t ) I ) \sqrt{\alpha_t-\alpha_t \alpha_{t-1}} \epsilon_{t-1}+\sqrt{1-\alpha_t} \epsilon_t \sim N\left(0,\left(\alpha_t-\alpha_t \alpha_{t-1}+1-\alpha_t\right) I\right) αtαtαt1 ϵt1+1αt ϵtN(0,(αtαtαt1+1αt)I)
    Let α i = 1 − β i \alpha_i=1-\beta_i αi=1βi

    Forward Process III

    X t = α t α t − 1 X t − 2 + 1 − α t α t − 1 ϵ X_t=\sqrt{\alpha_t \alpha_{t-1}} X_{t-2}+\sqrt{1-\alpha_t \alpha_{t-1}} \epsilon Xt=αtαt1 Xt2+1αtαt1 ϵ

    After doing it for many times: X t = α t α t − 1 … α 1 X 0 + 1 − α t α t − 1 … α 1 ϵ X_t=\sqrt{\alpha_t \alpha_{t-1} \ldots \alpha_1} X_0+\sqrt{1-\alpha_t \alpha_{t-1} \ldots \alpha_1} \epsilon Xt=αtαt1α1 X0+1αtαt1α1 ϵ

    Therefore: q ( X t ∣ X 0 ) = N ( α t ‾ X 0 , ( 1 − α t ‾ ) I ) q\left(X_t \mid X_0\right)=N\left(\sqrt{\overline{\alpha_t}} X_0,\left(1-\overline{\alpha_t}\right) I\right) q(XtX0)=N(αt X0,(1αt)I), where α t ‾ = α t α t − 1 … α 1 \overline{\alpha_t}=\alpha_t \alpha_{t-1} \ldots \alpha_1 αt=αtαt1α1

    Reverse Process

    Reverse Process I

    Let us use Bayes Theorem
    q ( X t − 1 ∣ X t ) = q ( X t − 1 ∣ X t , X 0 ) = q ( X t ∣ X t − 1 , X 0 ) q ( X t − 1 ∣ X 0 ) q ( X t ∣ X 0 ) q\left(X_{t-1} \mid X_t\right)=q\left(X_{t-1} \mid X_t, X_0\right)=\frac{q\left(X_t \mid X_{t-1}, X_0\right) q\left(X_{t-1} \mid X_0\right)}{q\left(X_t \mid X_0\right)} q(Xt1Xt)=q(Xt1Xt,X0)=q(XtX0)q(XtXt1,X0)q(Xt1X0)

    Reverse Process II

    We know these identities are true
    q ( X t ∣ X t − 1 , X 0 ) ∼ N ( α t X t − 1 , ( 1 − α t ) I ) q ( X t ∣ X 0 ) = N ( α t ‾ X 0 , ( 1 − α t ‾ ) I ) q ( X t − 1 ∣ X 0 ) = N ( α t − 1 ‾ X 0 , ( 1 − α t − 1 ‾ ) I )

    q(XtXt1,X0)N(αtXt1,(1αt)I)q(XtX0)=N(αt¯X0,(1αt¯)I)q(Xt1X0)=N(αt1¯X0,(1αt1¯)I)" role="presentation" style="position: relative;">q(XtXt1,X0)N(αtXt1,(1αt)I)q(XtX0)=N(αt¯X0,(1αt¯)I)q(Xt1X0)=N(αt1¯X0,(1αt1¯)I)
    q(XtXt1,X0)N(αt Xt1,(1αt)I)q(XtX0)=N(αt X0,(1αt)I)q(Xt1X0)=N(αt1 X0,(1αt1)I)

    Reverse Process III

    Let us apply these identities to the Bayes Theorem
    q ( X t − 1 ∣ X t ) = q ( X t ∣ X t − 1 , X 0 ) q ( X t − 1 ∣ X 0 ) q ( X t ∣ X 0 ) = exp ⁡ ( − 1 2 ( ( X t − α t X t − 1 ) 2 1 − α t + ( X t − 1 − α ˉ t X 0 ) 2 1 − α ˉ t − 1 − ( X t − α ˉ t X 0 ) 2 1 − α ˉ t ) ) = exp ⁡ ( − 1 2 ( ( α t 1 − α t + 1 1 − α ˉ t − 1 ) X t − 1 2 − ( 2 α t 1 − α t X t + 2 α ˉ t − 1 1 − α ˉ t − 1 X 0 ) X t − 1 + C ( X t , X 0 ) ) )

    q(Xt1Xt)=q(XtXt1,X0)q(Xt1X0)q(XtX0)=exp(12((XtαtXt1)21αt+(Xt1α¯tX0)21α¯t1(Xtα¯tX0)21α¯t))=exp(12((αt1αt+11α¯t1)Xt12(2αt1αtXt+2α¯t11α¯t1X0)Xt1+C(Xt,X0)))" role="presentation" style="position: relative;">q(Xt1Xt)=q(XtXt1,X0)q(Xt1X0)q(XtX0)=exp(12((XtαtXt1)21αt+(Xt1α¯tX0)21α¯t1(Xtα¯tX0)21α¯t))=exp(12((αt1αt+11α¯t1)Xt12(2αt1αtXt+2α¯t11α¯t1X0)Xt1+C(Xt,X0)))
    q(Xt1Xt)=q(XtX0)q(XtXt1,X0)q(Xt1X0)=exp(21(1αt(Xtαt Xt1)2+1αˉt1(Xt1αˉt X0)21αˉt(Xtαˉt X0)2))=exp(21((1αtαt+1αˉt11)Xt12(1αt2αt Xt+1αˉt12αˉt1 X0)Xt1+C(Xt,X0)))

    Reverse Process IV

    Find σ \sigma σ and μ \mu μ for the normal distribution
    exp ⁡ ( − 1 2 ( ( α t 1 − α t + 1 1 − α ˉ t − 1 ) X t − 1 2 − ( 2 α t 1 − α t X t + 2 α ˉ t − 1 1 − α ˉ t − 1 X 0 ) X t − 1 + C ( X t , X 0 ) ) ) exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) = exp ⁡ ( − 1 2 ( 1 σ 2 x 2 − 2 μ σ 2 x + μ 2 σ 2 ) )

    exp(12((αt1αt+11α¯t1)Xt12(2αt1αtXt+2α¯t11α¯t1X0)Xt1+C(Xt,X0)))exp((xμ)22σ2)=exp(12(1σ2x22μσ2x+μ2σ2))" role="presentation" style="position: relative;">exp(12((αt1αt+11α¯t1)Xt12(2αt1αtXt+2α¯t11α¯t1X0)Xt1+C(Xt,X0)))exp((xμ)22σ2)=exp(12(1σ2x22μσ2x+μ2σ2))
    exp(21((1αtαt+1αˉt11)Xt12(1αt2αt Xt+1αˉt12αˉt1 X0)Xt1+C(Xt,X0)))exp(2σ2(xμ)2)=exp(21(σ21x2σ22μx+σ2μ2))

    Reverse Process V

    By matching the three terms, we get the solution for μ t , σ t \mu_t, \sigma_t μt,σt
    μ t = 1 α t ( X t − 1 − α t 1 − α ˉ t ϵ t ) σ t 2 = ( 1 − α t ) ( 1 − α ˉ t − 1 ) 1 − α t α ˉ t − 1

    μt=1αt(Xt1αt1α¯tϵt)σt2=(1αt)(1α¯t1)1αtα¯t1" role="presentation" style="position: relative;">μt=1αt(Xt1αt1α¯tϵt)σt2=(1αt)(1α¯t1)1αtα¯t1
    μt=αt 1(Xt1αˉt 1αtϵt)σt2=1αtαˉt1(1αt)(1αˉt1)
    μ t , σ t \mu_t, \sigma_t μt,σt是要我们解的东西
    This is what we should use in the reserve process

    Next: How to train the encoder ϵ t \epsilon_t ϵt

    Loss Function

    Want the reverse process p θ ( X ) \boldsymbol{p}_\theta(\boldsymbol{X}) pθ(X) as close as the forward process q ( X ) \boldsymbol{q}(\boldsymbol{X}) q(X)
    Use KL divergence as loss to match two distributions
    D ( q ( X 0 ) ∥ p θ ( X 0 ) ) = ∫ q ( X 0 ) log ⁡ ( q ( X 0 ) p θ ( X 0 ) ) d X 0 D\left(q\left(X_0\right) \| p_\theta\left(X_0\right)\right)=\int q\left(X_0\right) \log \left(\frac{q\left(X_0\right)}{p_\theta\left(X_0\right)}\right) d X_0 D(q(X0)pθ(X0))=q(X0)log(pθ(X0)q(X0))dX0

    常用tool:

    The Evidence Lower Bound

    • E L B O \mathrm{ELBO} ELBO (Evidence Lower Bound)
    • Let p θ p_\theta pθ and q θ q_\theta qθ be two distributions, we have:
      ln ⁡ p θ ( x ) ≥ E z ∼ q ϕ [ ln ⁡ p θ ( x , z ) q ϕ ( z ) ] . \ln p_\theta(x) \geq \mathbb{E}_{z \sim q_\phi}\left[\ln \frac{p_\theta(x, z)}{q_\phi(z)}\right] . lnpθ(x)Ezqϕ[lnqϕ(z)pθ(x,z)].
    • Step 1:
      L ( ϕ , θ ; x ) : = E z ∼ q ϕ ( ∣ x ) [ ln ⁡ p θ ( x , z ) q ϕ ( z ∣ x ) ] . L(\phi, \theta ; x):=\mathbb{E}_{z \sim q_\phi(\mid x)}\left[\ln \frac{p_\theta(x, z)}{q_\phi(z \mid x)}\right] . L(ϕ,θ;x):=Ezqϕ(x)[lnqϕ(zx)pθ(x,z)].
    • Step 2:
      L ( ϕ , θ ; x ) = E z ∼ q ϕ ( ∣ x ) [ ln ⁡ p θ ( x , z ) ] + H [ q ϕ ( z ∣ x ) ] = ln ⁡ p θ ( x ) − D K L ( q ϕ ( z ∣ x ) ∥ p θ ( z ∣ x ) ) .
      L(ϕ,θ;x)=Ezqϕ(x)[lnpθ(x,z)]+H[qϕ(zx)]=lnpθ(x)DKL(qϕ(zx)pθ(zx))." role="presentation" style="position: relative;">L(ϕ,θ;x)=Ezqϕ(x)[lnpθ(x,z)]+H[qϕ(zx)]=lnpθ(x)DKL(qϕ(zx)pθ(zx)).
      L(ϕ,θ;x)=Ezqϕ(x)[lnpθ(x,z)]+H[qϕ(zx)]=lnpθ(x)DKL(qϕ(zx)pθ(zx)).
    • Conclusion (Many details skipped): Can derive a quadratic lower bound on KL divergence

    ps 初步了解了一下clip

    CLIP

    clip初认识
    它是一个 zero-shot 的视觉分类模型,预训练的模型在没有微调的情况下在下游任务上取得了很好的迁移效果。作者在30多个数据集上做了测试,涵盖了 OCR、视频中的动作检测、坐标定位等任务。
    预训练网络的输入是文字与图片的配对,每一张图片都配有一小句解释性的文字。将文字和图片分别通过一个编码器,得到向量表示。这里的文本编码器就是 Transformer;而图片编码器既可以是 Resnet,也可以是 Vision transformer,作者对这两种结构都进行了考察。
    请添加图片描述
    开源了预训练好的模型和 API,可以直接拿来做下游任务的推理:
    https://github.com/openai/CLIP
    https://openai.com/research/clip
    一些可以试一试的项目:
    https://github.com/yunhao-tech/Course_project/blob/master/Advanced%20Machine%20learning/Final%20project_CLIP.ipynb

    当我们在谈论 Text-To-Image:Diffusion Model

    背景
    2022最卷的领域-文本生成图像
    2021年1月,国际知名AI公司OpenAI公布了其首个文本生成图像模型DALL·E 。2021年12月底,OpenAI再次提出GLIDE模型,此模型能够生成比DALL·E更复杂、更丰富的图像。2022年4月,OpenAI又又又提出DALL·E 2,这次他们已经自信地表示“能够生成真实或者艺术图像”。仅一个月后,2022年5月,Google不甘落后发表其新模型Imagen,在写实性上击败DALL·E 2。

    可以发现从2022年初开始,各种新模型如雨后春笋般冒出来,但其实它们背后都是一个模型范式:Diffusion Model 。下文我们就介绍下这个图像界的新贵,它在图像领域已经是比肩GAN的存在,或许其作用会进一步延伸到NLP,最终成为有一个通用模型范式。

  • 相关阅读:
    完全背包问题的解决方法______闫氏 DP 分析法
    【定语从句练习题】who、which
    华为云云耀云服务器L实例评测|centos7.9 配置python虚拟环境 运行django
    【图像配准】基于surf算法实现图像配准附Matlab代码
    关于基环树找环问题
    【Python基础入门5】关于数据类型
    idea提交代码冲突后,代码意外消失解决办法
    Makefile学习笔记
    大厂秋招真题【模拟】OPPO20230802秋招提前批T2-小欧的圆覆盖【欧弟算法】全网最全大厂秋招题解
    深入理解Pod对象:基本管理
  • 原文地址:https://blog.csdn.net/zzrh2018/article/details/134454431