Diffusion Models & CLIP

Diffusion Models & CLIP
Introduction to Diffusion Models

生成模型
主要指的是无监督学习中的生成模型，在无监督学习中的主要任务是让机器学习给定的样本，然后生成一些新的东西出来。比如：给机器看一些图片，能够生成一些新的图片出来，给机器读一些诗，然后能够自己写诗出来。

Forward Process

Forward Process I

Given $q\left(X_0\right), q\left(X_t \mid X_{t-1}\right)$ , calculate $q\left(X_t \mid X_0\right)$

Forward step
$X_t=\sqrt{\alpha_t} X_{t-1}+\sqrt{1-\alpha_t} \epsilon_t$ , where $\epsilon_t \sim N(0, I)$
Thus, $q\left(X_t \mid X_{t-1}\right)=N\left(\sqrt{\alpha_t} X_{t-1},\left(1-\alpha_t\right) I\right)$
（你保证alpha都是（0，1）的就行）

Forward Process II

$\begin{matrix} X_{t} = \sqrt{α_{t}} X_{t - 1} + \sqrt{1 - α_{t}} ϵ_{t} \\ = \sqrt{α_{t}} (\sqrt{α_{t - 1}} X_{t - 2} + \sqrt{1 - α_{t - 1}} ϵ_{t - 1}) + \sqrt{1 - α_{t}} ϵ_{t} \\ = \sqrt{α_{t} α_{t - 1}} X_{t - 2} + \sqrt{α_{t} - α_{t} α_{t - 1}} ϵ_{t - 1} + \sqrt{1 - α_{t}} ϵ_{t} \end{matrix}$
Xt=αt Xt−1+1−αt ϵt=αt (αt−1 Xt−2+1−αt−1 ϵt−1)+1−αt ϵt=αtαt−1 Xt−2+αt−αtαt−1 ϵt−1+1−αt ϵt

Fact: The sum of two normal distributions is still a normal distribution

Therefore: $\sqrt{\alpha_t-\alpha_t \alpha_{t-1}} \epsilon_{t-1}+\sqrt{1-\alpha_t} \epsilon_t \sim N\left(0,\left(\alpha_t-\alpha_t \alpha_{t-1}+1-\alpha_t\right) I\right)$
Let $\alpha_i=1-\beta_i$

Forward Process III

$X_t=\sqrt{\alpha_t \alpha_{t-1}} X_{t-2}+\sqrt{1-\alpha_t \alpha_{t-1}} \epsilon$

After doing it for many times: $X_t=\sqrt{\alpha_t \alpha_{t-1} \ldots \alpha_1} X_0+\sqrt{1-\alpha_t \alpha_{t-1} \ldots \alpha_1} \epsilon$

Therefore: $q\left(X_t \mid X_0\right)=N\left(\sqrt{\overline{\alpha_t}} X_0,\left(1-\overline{\alpha_t}\right) I\right)$ , where $\overline{\alpha_t}=\alpha_t \alpha_{t-1} \ldots \alpha_1$

Reverse Process

Reverse Process I

Let us use Bayes Theorem
$q\left(X_{t-1} \mid X_t\right)=q\left(X_{t-1} \mid X_t, X_0\right)=\frac{q\left(X_t \mid X_{t-1}, X_0\right) q\left(X_{t-1} \mid X_0\right)}{q\left(X_t \mid X_0\right)}$

Reverse Process II

We know these identities are true

$\begin{matrix} q (X_{t} ∣ X_{t - 1}, X_{0}) \sim N (\sqrt{α_{t}} X_{t - 1}, (1 - α_{t}) I) \\ q (X_{t} ∣ X_{0}) = N (\sqrt{\bar{α_{t}}} X_{0}, (1 - \bar{α_{t}}) I) \\ q (X_{t - 1} ∣ X_{0}) = N (\sqrt{\bar{α_{t - 1}}} X_{0}, (1 - \bar{α_{t - 1}}) I) \end{matrix}$
q(Xt∣Xt−1,X0)∼N(αt Xt−1,(1−αt)I)q(Xt∣X0)=N(αt X0,(1−αt)I)q(Xt−1∣X0)=N(αt−1 X0,(1−αt−1)I)

Reverse Process III

Let us apply these identities to the Bayes Theorem

$\begin{matrix} q (X_{t - 1} ∣ X_{t}) = \frac{q (X_{t} ∣ X_{t - 1}, X_{0}) q (X_{t - 1} ∣ X_{0})}{q (X_{t} ∣ X_{0})} \\ = \exp (- \frac{1}{2} (\frac{{(X_{t} - \sqrt{α_{t}} X_{t - 1})}^{2}}{1 - α_{t}} + \frac{{(X_{t - 1} - \sqrt{{\bar{α}}_{t}} X_{0})}^{2}}{1 - {\bar{α}}_{t - 1}} - \frac{{(X_{t} - \sqrt{{\bar{α}}_{t}} X_{0})}^{2}}{1 - {\bar{α}}_{t}})) \\ = \exp (- \frac{1}{2} ((\frac{α_{t}}{1 - α_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}) X_{t - 1}^{2} - (\frac{2 \sqrt{α_{t}}}{1 - α_{t}} X_{t} + \frac{2 \sqrt{{\bar{α}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} X_{0}) X_{t - 1} + C (X_{t}, X_{0}))) \end{matrix}$
q(Xt−1∣Xt)=q(Xt∣X0)q(Xt∣Xt−1,X0)q(Xt−1∣X0)=exp(−21(1−αt(Xt−αt Xt−1)2+1−αˉt−1(Xt−1−αˉt X0)2−1−αˉt(Xt−αˉt X0)2))=exp(−21((1−αtαt+1−αˉt−11)Xt−12−(1−αt2αt Xt+1−αˉt−12αˉt−1 X0)Xt−1+C(Xt,X0)))

Reverse Process IV

Find $\sigma$ and $\mu$ for the normal distribution

$\begin{aligned} \exp (- \frac{1}{2} ((\frac{α_{t}}{1 - α_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}) X_{t - 1}^{2} - (\frac{2 \sqrt{α_{t}}}{1 - α_{t}} X_{t} + \frac{2 \sqrt{{\bar{α}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} X_{0}) X_{t - 1} + C (X_{t}, X_{0}))) \\ \exp (- \frac{(x - μ)^{2}}{2 σ^{2}}) = \exp (- \frac{1}{2} (\frac{1}{σ^{2}} x^{2} - \frac{2 μ}{σ^{2}} x + \frac{μ^{2}}{σ^{2}})) \end{aligned}$
exp(−21((1−αtαt+1−αˉt−11)Xt−12−(1−αt2αt Xt+1−αˉt−12αˉt−1 X0)Xt−1+C(Xt,X0)))exp(−2σ2(x−μ)2)=exp(−21(σ21x2−σ22μx+σ2μ2))

Reverse Process V

By matching the three terms, we get the solution for $\mu_t, \sigma_t$

$\begin{matrix} μ_{t} = \frac{1}{\sqrt{α_{t}}} (X_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ϵ_{t}) \\ σ_{t}^{2} = \frac{(1 - α_{t}) (1 - {\bar{α}}_{t - 1})}{1 - α_{t} {\bar{α}}_{t - 1}} \end{matrix}$
μt=αt 1(Xt−1−αˉt 1−αtϵt)σt2=1−αtαˉt−1(1−αt)(1−αˉt−1)
$\mu_t, \sigma_t$ 是要我们解的东西
This is what we should use in the reserve process

Next: How to train the encoder $\epsilon_t$

Loss Function

Want the reverse process $\boldsymbol{p}_\theta(\boldsymbol{X})$ as close as the forward process $\boldsymbol{q}(\boldsymbol{X})$
Use KL divergence as loss to match two distributions
$D\left(q\left(X_0\right) \| p_\theta\left(X_0\right)\right)=\int q\left(X_0\right) \log \left(\frac{q\left(X_0\right)}{p_\theta\left(X_0\right)}\right) d X_0$

常用tool：

The Evidence Lower Bound
- $\mathrm{ELBO}$ (Evidence Lower Bound)
- Let $p_\theta$ and $q_\theta$ be two distributions, we have:
  $\ln p_\theta(x) \geq \mathbb{E}_{z \sim q_\phi}\left[\ln \frac{p_\theta(x, z)}{q_\phi(z)}\right] .$
- Step 1:
  $L(\phi, \theta ; x):=\mathbb{E}_{z \sim q_\phi(\mid x)}\left[\ln \frac{p_\theta(x, z)}{q_\phi(z \mid x)}\right] .$
- Step 2:
  $\begin{aligned} L (ϕ, θ; x) & = E_{z \sim q_{ϕ} (∣ x)} [\ln p_{θ} (x, z)] + H [q_{ϕ} (z ∣ x)] \\ = \ln p_{θ} (x) - D_{K L} (q_{ϕ} (z ∣ x) ‖ p_{θ} (z ∣ x)) . \end{aligned}$
- Conclusion (Many details skipped): Can derive a quadratic lower bound on KL divergence
ps 初步了解了一下clip

CLIP

clip初认识
它是一个 zero-shot 的视觉分类模型，预训练的模型在没有微调的情况下在下游任务上取得了很好的迁移效果。作者在30多个数据集上做了测试，涵盖了 OCR、视频中的动作检测、坐标定位等任务。
预训练网络的输入是文字与图片的配对，每一张图片都配有一小句解释性的文字。将文字和图片分别通过一个编码器，得到向量表示。这里的文本编码器就是 Transformer；而图片编码器既可以是 Resnet，也可以是 Vision transformer，作者对这两种结构都进行了考察。

开源了预训练好的模型和 API，可以直接拿来做下游任务的推理：
https://github.com/openai/CLIP
https://openai.com/research/clip
一些可以试一试的项目：
https://github.com/yunhao-tech/Course_project/blob/master/Advanced%20Machine%20learning/Final%20project_CLIP.ipynb

当我们在谈论 Text-To-Image：Diffusion Model

背景
2022最卷的领域-文本生成图像
2021年1月，国际知名AI公司OpenAI公布了其首个文本生成图像模型DALL·E 。2021年12月底，OpenAI再次提出GLIDE模型，此模型能够生成比DALL·E更复杂、更丰富的图像。2022年4月，OpenAI又又又提出DALL·E 2，这次他们已经自信地表示“能够生成真实或者艺术图像”。仅一个月后，2022年5月，Google不甘落后发表其新模型Imagen，在写实性上击败DALL·E 2。

可以发现从2022年初开始，各种新模型如雨后春笋般冒出来，但其实它们背后都是一个模型范式：Diffusion Model 。下文我们就介绍下这个图像界的新贵，它在图像领域已经是比肩GAN的存在，或许其作用会进一步延伸到NLP，最终成为有一个通用模型范式。
相关阅读:
完全背包问题的解决方法______闫氏 DP 分析法
 【定语从句练习题】who、which
华为云云耀云服务器L实例评测｜centos7.9 配置python虚拟环境运行django
【图像配准】基于surf算法实现图像配准附Matlab代码
 关于基环树找环问题
 【Python基础入门5】关于数据类型
 idea提交代码冲突后，代码意外消失解决办法
 Makefile学习笔记
 大厂秋招真题【模拟】OPPO20230802秋招提前批T2-小欧的圆覆盖【欧弟算法】全网最全大厂秋招题解
 深入理解Pod对象：基本管理
原文地址：https://blog.csdn.net/zzrh2018/article/details/134454431

Introduction to Diffusion Models

Forward Process

Forward Process I

Forward Process II

Forward Process III

Reverse Process

Reverse Process I

Reverse Process II

Reverse Process III

Reverse Process IV

Reverse Process V

Loss Function

The Evidence Lower Bound

CLIP