• ③单细胞学习-pbmc的Seurat 流程


    目录

    1,数据读取

    2,线粒体基因查看

    3,数据标准化

    4,识别高变基因

    5,进行数据归一化

    6,进行线性降维

    7,确定细胞簇

    8,UMAP/tSNE降维(保存pbmc_tutorial.rds)

    补充:降维复现镜像

    9,分析差异基因

    10,可视化基因

    11,定义细胞类

    1,数据读取

    Analysis, visualization, and integration of Visium HD spatial datasets with Seurat • Seurat (satijalab.org)

    1. rm(list = ls())
    2. library(dplyr)
    3. library(Seurat)
    4. library(patchwork)
    5. # Load the PBMC dataset :注意需要将文件解压后才能读取处理
    6. #解压后包括三个文件barcodes.tsv;genes.tsv ;matrix.mtx
    7. pbmc.data <- Read10X(data.dir = "D:/2024年5月30日pbmc流程/pbmc3k_filtered_gene_bc_matrices")
    8. # #用原始数据(非规范化数据)初始化Seurat对象
    9. pbmc <- CreateSeuratObject(counts = pbmc.data,
    10. project = "pbmc3k",
    11. min.cells = 3, # min.cell:每个feature至少在多少个细胞中表达(feature=gene)
    12. min.features = 200) # min.features:每个细胞中至少有多少个feature被检测到
    13. pbmc
    2,线粒体基因查看
    1. # The [[ operator can add columns to object metadata. This is a great place to stash QC stats
    2. pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")#计算线粒体比例
    3. #We filter cells that have unique feature counts over 2,500 or less than 200
    4. #We filter cells that have >5% mitochondrial counts
    5. VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
    6. #功能散点图通常用于可视化功能之间的关系
    7. #用于对象计算的任何内容,即对象元数据中的列、PC分数等
    8. plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
    9. plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
    10. plot1 + plot2

    过滤

    1. #过滤
    2. pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
    3. pbmc
    3,数据标准化
    1. #数据标准化:
    2. pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
    3. #pbmc <- NormalizeData(pbmc)#可以使用替代
    4,识别高变基因
    1. #识别高变基因
    2. pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)#返回2000个高变基因
    3. # Identify the 10 most highly variable genes
    4. top10 <- head(VariableFeatures(pbmc), 10)
    5. top10
    6. # plot variable features with and without labels
    7. plot1 <- VariableFeaturePlot(pbmc)
    8. #plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
    9. #这里关于绘制散点图注释关于重叠的参数设置可以为0
    10. plot2 <- LabelPoints(plot = plot1, points = top10, repel = 0)
    11. plot1 + plot2

    5,进行数据归一化
    1. all.genes <- rownames(pbmc)
    2. pbmc <- ScaleData(pbmc, features = all.genes)

    注意:设置参数features是因为ScaleData默认处理前面鉴定的差异基因。这一步怎么做都不会影响到后续pca和聚类,但是会影响做热图。Scale并不会影响降维的结构,也就是说不影响数据的分布。

    6,进行线性降维
    1. #进行线性降维
    2. pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
    3. #检查及可视化PCA结果
    4. print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
    5. VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")
    6. DimPlot(pbmc, reduction = "pca") + NoLegend()
    7. DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)

    7,确定细胞簇
    1. #确定单细胞降维维度(簇数)
    2. ElbowPlot(pbmc)#用肘线图确定曲线折线处

    1. pbmc <- FindNeighbors(pbmc, dims = 1:10)
    2. pbmc <- FindClusters(pbmc, resolution = 0.5)
    3. ## 查看前5分析细胞聚类数ID
    4. head(Idents(pbmc), 5)
    5. ## 查看每个类别多少个细胞
    6. table(pbmc@meta.data$seurat_clusters)#每个类别细胞越来越少
    8,UMAP/tSNE降维(保存pbmc_tutorial.rds)
    1. pbmc <- RunUMAP(pbmc, dims = 1:10)
    2. DimPlot(pbmc, reduction = "umap")
    3. #保存过滤及质控降维后的数据集(这里可以不用添加保存路径)
    4. saveRDS(pbmc, file = "pbmc_tutorial.rds")#https://www.jianshu.com/p/aca662db800e

    补充:降维复现镜像

    为什么你画的Seurat包PCA图与别人的方向不一致?-腾讯云开发者社区-腾讯云 (tencent.com)

    使用随机的是这个函数,随机参数为maxit

    maxit:maximum number of iterations

    但是发现这个函数最后使用的C/C++代码…

    除了RunPCA函数之外,Seurat包中使用了随机种子的还有RunTSNE函数,默认为seed.use = 1,RunUMAP,默认为seed.use = 42,这两个函数再使用RunUMAP时回遇到画出来的图不一致,RunTSNE倒是没有遇见过很明显不一样的时候。

    总结

    挖到最后,发现还是有点说不通,没给找到一个合理的解释。

    总之,如果你发现自己在使用Seurat包重复某一文章或者别人的教程还是官网的示例时,发现自己画出来的图与原有的方向呈镜像或者上下颠倒,可以试着改一下这个随机种子


    9,分析差异基因
    1. rm(list = ls())
    2. library(dplyr)
    3. library(Seurat)
    4. library(patchwork)
    5. pbmc <- readRDS("pbmc_tutorial.rds")
    6. #DimPlot(pbmc, reduction = "umap")##降维可视化优点镜像
    7. #寻找差异表达基因
    8. # find all markers of cluster 2
    9. cluster2.markers <- FindMarkers(pbmc, ident.1 = 2)
    10. head(cluster2.markers, n = 5)
    11. # find all markers distinguishing cluster 5 from clusters 0 and 3
    12. cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3))
    13. head(cluster5.markers, n = 5)
    14. pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)
    15. pbmc.markers %>%
    16. group_by(cluster) %>%
    17. dplyr::filter(avg_log2FC > 1)
    18. #Seurat可以通过参数test.use设定检验差异表达的方法
    19. cluster0.markers <- FindMarkers(pbmc, ident.1 = 0, logfc.threshold = 0.25,
    20. test.use = "roc", only.pos = TRUE)
    10,可视化基因
    1. #可视化标记基因:VlnPlot基因表达分布;FeaturePlot在tSNE 中展示
    2. VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
    3. # you can plot raw counts as well
    4. VlnPlot(pbmc, features = c("NKG7", "PF4"), slot = "counts", log = TRUE)
    5. #降维展示
    6. FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A",
    7. "FCGR3A", "LYZ", "PPBP","CD8A"))
    8. #DoHeatmap为指定的细胞和基因花表达热图
    9. pbmc.markers %>%
    10. group_by(cluster) %>%
    11. dplyr::filter(avg_log2FC > 1) %>%
    12. slice_head(n = 10) %>%
    13. ungroup() -> top10
    14. DoHeatmap(pbmc, features = top10$gene) + NoLegend()

    11,定义细胞类群
    Cluster IDMarkersCell Type
    0IL7R, CCR7Naive CD4+ T
    1CD14, LYZCD14+ Mono
    2IL7R, S100A4Memory CD4+
    3MS4A1B
    4CD8ACD8+ T
    5FCGR3A, MS4A7FCGR3A+ Mono
    6GNLY, NKG7NK
    7FCER1A, CST3DC
    8PPBPPlatelet
    1. #为聚类分配单元类型标识
    2. new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono",
    3. "NK", "DC", "Platelet")
    4. names(new.cluster.ids) <- levels(pbmc)
    5. pbmc <- RenameIdents(pbmc, new.cluster.ids)
    6. DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
    7. #进行图片保存
    8. #library(ggplot2)
    9. #plot <- DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 4.5) + xlab("UMAP 1") + ylab("UMAP 2") +
    10. # theme(axis.title = element_text(size = 18), legend.text = element_text(size = 18)) + guides(colour = guide_legend(override.aes = list(size = 10)))
    11. #ggsave(filename = "../output/images/pbmc3k_umap.jpg", height = 7, width = 12, plot = plot, quality = 50)
    12. saveRDS(pbmc, file = "pbmc3k_final.rds")#用于cellchat准备

    参考:

    1:Analysis, visualization, and integration of Visium HD spatial datasets with Seurat • Seurat (satijalab.org)

    2:单细胞测序分析: Seurat 使用教程 - 简书 (jianshu.com)

    3:单细胞初探(seurat基础流程)(2021公开课配套笔记)-腾讯云开发者社区-腾讯云 (tencent.com)

  • 相关阅读:
    L64.linux命令每日一练 -- 第十章 Linux网络管理命令 -- ifconfig和ifup
    centos7最简洁的搭建samba服务器
    再见了,收费的云笔记,自己搭建的就是好用
    Scala基础【入门及安装】
    面经分享 | 某康安全开发工程师
    只要封装相同,电容体本身大小就一样吗?
    基于 NNCF 和 Optimum 面向 Intel CPU 对 Stable Diffusion 优化
    #分库分表-分片算法
    云计算技术 — 混合云 — 技术架构
    力扣 83. 删除排序链表中的重复元素
  • 原文地址:https://blog.csdn.net/hx2024/article/details/139361074