码农知识堂 - 1000bd
  •   Python
  •   PHP
  •   JS/TS
  •   JAVA
  •   C/C++
  •   C#
  •   GO
  •   Kotlin
  •   Swift
  • 单细胞聚类,究竟聚类聚成多少类比较合适?全代码分享


    单细胞数据分析到最后一步往往都需要聚类,进而给亚群命名。但是我们通常纠结resolution到底选多大为好,究竟聚成多少类比较合适?今天我们使用Silhouette来确定多少类比较合适。

    关注微信:生信小博士

    选择最优聚类有多种方式,今天着重于Silhouette Method。

    Determining optimal number of clusters (k)

    Before we do the actual clustering, we need to identity the Optimal number of clusters (k) for this data set of wholesale customers. The popular way of determining number of clusters are

    1. Elbow Method
    2. Silhouette Method
    3. Gap Static Method

    Elbow and Silhouette methods are direct methods and gap statistic method is the statistics method.

    In this demonstration, we are going to see how silhouette method is used.

    1 最优聚类 数据前处理

    如果你已经准备好前期的数据,此步可以省略。

    这一步的目的:会得到一个名为inttegtaed_data的seurat对象。

    1. slide_files <- list.files("~/silicosis/spatial/prcessed_visum_for_progeny/data/",recursive = TRUE,
    2. all.files = TRUE,full.names = TRUE,pattern = "_")
    3. integrated_data <- map(slide_files, readRDS)
    4. print("You managed to load everything")
    5. print("Object size")
    6. print(object.size(integrated_data))
    7. # Calculate HVG per sample - Here we assume that batch and patient effects aren't as strong
    8. # since cell-types and niches should be greater than the number of batches
    9. Assays(integrated_data [[1]])
    10. str(integrated_data [[1]])
    11. def_assay="Spatial"
    12. hvg_list <- map(integrated_data, function(x) {
    13. DefaultAssay(x) <- def_assay
    14. x <- FindVariableFeatures(x, selection.method = "vst",
    15. nfeatures = 3000)
    16. x@assays[[def_assay]]@var.features
    17. }) %>% unlist()
    18. hvg_list <- table(hvg_list) %>%
    19. sort(decreasing = TRUE)
    20. gene_selection_plt <- hvg_list %>% enframe() %>%
    21. group_by(value) %>%
    22. mutate(value = as.numeric(value)) %>%
    23. summarize(ngenes = length(name)) %>%
    24. ggplot(aes(x = value, y = ngenes)) +
    25. geom_bar(stat = "identity")
    26. gene_selection <- hvg_list[1:3000] %>% names()
    27. #########
    28. # Create merged object ---------------------------------
    29. integrated_data <- purrr::reduce(integrated_data,
    30. merge,
    31. merge.data = TRUE)
    32. print("You managed to merge everything")
    33. print("Object size")
    34. print(object.size(integrated_data))
    35. # Default assay ---------------------------------------
    36. #DefaultAssay(integrated_data) <- def_assay
    37. # Process it before integration -----------------------
    38. integrated_data <- integrated_data %>%
    39. ScaleData(verbose = FALSE) %>%
    40. RunPCA(features = gene_selection,
    41. npcs = 30,
    42. verbose = FALSE)
    43. original_pca_plt <- DimPlot(object = integrated_data,
    44. reduction = "pca",
    45. pt.size = .1,
    46. group.by = "orig.ident")
    47. head(integrated_data@meta.data)
    48. batch_covars="orig.ident"
    49. # Integrate the data -----------------------
    50. integrated_data <- RunHarmony(integrated_data,
    51. group.by.vars = batch_covars,
    52. plot_convergence = TRUE,
    53. assay.use = def_assay,
    54. max.iter.harmony = 20)
    55. # Corrected dimensions -----------------------
    56. corrected_pca_plt <- DimPlot(object = integrated_data,
    57. reduction = "harmony",
    58. pt.size = .1,
    59. group.by = "orig.ident")
    60. # Create the UMAP with new reduction -----------
    61. integrated_data <- integrated_data %>%
    62. RunUMAP(reduction = "harmony", dims = 1:30,
    63. reduction.name = "umap_harmony") %>%
    64. RunUMAP(reduction = "pca", dims = 1:30,
    65. reduction.name = "umap_original")
    66. integrated_data <- FindNeighbors(integrated_data,
    67. reduction = "harmony",
    68. dims = 1:30)
    69. head(integrated_data@meta.data)
    70. colnames(integrated_data@meta.data)=gsub(pattern ="_snn_res.",replacement = "_before",x = colnames(integrated_data@meta.data) )

    准备好的数据如下。这里的数据你可以使用seurat的pbmc教程中的数据,作为练习。

    然后进行下一步

    2 开始最优聚类

    全代码如下

    1. optimize=TRUE
    2. ################opt cluster--------------
    3. if(optimize) {
    4. # Clustering and optimization -------------------------
    5. print("Optimizing clustering")
    6. seq_res <- seq(0.5, 1.5, 0.1)
    7. # seq_res=1
    8. integrated_data <- FindClusters(integrated_data,
    9. resolution = seq_res,
    10. verbose = F)
    11. clustree_plt <- clustree::clustree(integrated_data,
    12. prefix = paste0(DefaultAssay(integrated_data), "_snn_res."))
    13. # Optimize clustering ------------------------------------------------------
    14. cell_dists <- dist(integrated_data@reductions$harmony@cell.embeddings,
    15. method = "euclidean")
    16. head(cell_dists)
    17. cluster_info <- integrated_data@meta.data[,grepl(paste0(DefaultAssay(integrated_data),"_snn_res"),
    18. colnames(integrated_data@meta.data))] %>%
    19. dplyr::mutate_all(as.character) %>%
    20. dplyr::mutate_all(as.numeric)
    21. head(cluster_info)[,1:9]
    22. # head(cell_dists)[,1:9]
    23. si= silhouette(cluster_info[,1], cell_dists) %>%head()
    24. si
    25. silhouette_res <- apply(cluster_info, 2, function(x){
    26. si <- silhouette(x, cell_dists)
    27. if(!any(is.na(si))) {
    28. mean(si[, 'sil_width'])
    29. } else {
    30. NA
    31. }
    32. })
    33. head(silhouette_res)
    34. integrated_data[["opt_clust_integrated"]] <- integrated_data[[names(which.max(silhouette_res))]]
    35. Idents(integrated_data) = "opt_clust_integrated"
    最终会得到一个opt_clust_integrated,代表最优聚类的清晰度。如果你的数据比较大,可以调整   seq_res <- seq(0.5, 1.5, 0.1),把1.5调到3,甚至更高。

    上述代码的部分运行结果:

    根据下面的图,resolution为0.7是最优聚类

    3 找到最优聚类之后,把多余的聚类结果删掉,节省内存

    全代码如下

    1. # Reduce meta-data -------------------------------------------------------------------------
    2. spam_cols <- grepl(paste0(DefaultAssay(integrated_data), "_snn_res"),
    3. colnames(integrated_data@meta.data)) |
    4. grepl("seurat_clusters",colnames(integrated_data@meta.data))
    5. integrated_data@meta.data <- integrated_data@meta.data[,!spam_cols]
    6. } else {
    7. print("Not Optimizing clustering")
    8. seq_res <- seq(0.2, 1.6, 0.2)
    9. integrated_data <- FindClusters(integrated_data,
    10. resolution = seq_res,
    11. verbose = F)
    12. clustree_plt <- clustree(integrated_data,
    13. prefix = paste0(DefaultAssay(integrated_data),
    14. "_snn_res."))
    15. integrated_data <- FindClusters(integrated_data,
    16. resolution = default_resolution,
    17. verbose = F)
    18. integrated_data[["opt_clust_integrated"]] <- integrated_data[["seurat_clusters"]]
    19. spam_cols <- grepl(paste0(DefaultAssay(integrated_data), "_snn_res"),
    20. colnames(integrated_data@meta.data)) |
    21. grepl("seurat_clusters",colnames(integrated_data@meta.data))
    22. integrated_data@meta.data <- integrated_data@meta.data[,!spam_cols]
    23. }

    4 保存对象

    1. # Save object ------------------------------------------------------
    2. saveRDS(integrated_data, file ="~/silicosis/spatial/integrated_slides/integrated_slides.rds" )

     关注微信:生信小博士

    参考: 

    https://medium.com/codesmart/r-series-k-means-clustering-silhouette-794774b46586#id_token=eyJhbGciOiJSUzI1NiIsImtpZCI6ImEwNmFmMGI2OGEyMTE5ZDY5MmNhYzRhYmY0MTVmZjM3ODgxMzZmNjUiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiIyMTYyOTYwMzU4MzQtazFrNnFlMDYwczJ0cDJhMmphbTRsamRjbXMwMHN0dGcuYXBwcy5nb29nbGV1c2VyY29udGVudC5jb20iLCJhdWQiOiIyMTYyOTYwMzU4MzQtazFrNnFlMDYwczJ0cDJhMmphbTRsamRjbXMwMHN0dGcuYXBwcy5nb29nbGV1c2VyY29udGVudC5jb20iLCJzdWIiOiIxMTIyMzMxNDE1MDU0MDY1NzA5OTMiLCJlbWFpbCI6ImFudGkuY2FuY2VyLmZpZ2h0ZXJAZ21haWwuY29tIiwiZW1haWxfdmVyaWZpZWQiOnRydWUsIm5iZiI6MTY5ODE5ODYzMCwibmFtZSI6InlhbmcgeWFuZyIsInBpY3R1cmUiOiJodHRwczovL2xoMy5nb29nbGV1c2VyY29udGVudC5jb20vYS9BQ2c4b2NKUjVFOGJoU2wxRUVtUDcyVXhPUExGU2VlNlcza2hlZVItVnVaZDZiei09czk2LWMiLCJnaXZlbl9uYW1lIjoieWFuZyIsImZhbWlseV9uYW1lIjoieWFuZyIsImxvY2FsZSI6ImVuIiwiaWF0IjoxNjk4MTk4OTMwLCJleHAiOjE2OTgyMDI1MzAsImp0aSI6ImJlNjYyNmYyMWMwZGE4MjgzZWUzMWEzMGU0ZGY2MWQ0Y2M3YzlkODgifQ.FddjhGbYiuFDFjWFmjGcAXNdbuR3XyqzqWWv8NvyJnREj48qhcFsI4HHv0DWnwewOsmr2jnCnC51nInSyBvgZJSjAQMdUAVpoZTwvroliR3wB6q8In29eXhZ2N_WgqVqoTcMT5yECF4oJrVZgDYxG5t17KAIYso3pvbpCy-5oVq8zBPZ8M8HRd83upKAODDC4bAKtGsGqlFYDV-bD7L1pXsRw3HWebcyY7bfk74FVtnveGyk-A0VIHjDIAdxxjqMiZmuntRX7PUV-nXhSeiBxzI8W4kjHO8uQKlwaCgjyOARTNQ2b-AOY5f5gr6BP0kin9DN6rk7QuPfrvVIgg4yGwicon-default.png?t=N7T8https://medium.com/codesmart/r-series-k-means-clustering-silhouette-794774b46586#id_token=eyJhbGciOiJSUzI1NiIsImtpZCI6ImEwNmFmMGI2OGEyMTE5ZDY5MmNhYzRhYmY0MTVmZjM3ODgxMzZmNjUiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiIyMTYyOTYwMzU4MzQtazFrNnFlMDYwczJ0cDJhMmphbTRsamRjbXMwMHN0dGcuYXBwcy5nb29nbGV1c2VyY29udGVudC5jb20iLCJhdWQiOiIyMTYyOTYwMzU4MzQtazFrNnFlMDYwczJ0cDJhMmphbTRsamRjbXMwMHN0dGcuYXBwcy5nb29nbGV1c2VyY29udGVudC5jb20iLCJzdWIiOiIxMTIyMzMxNDE1MDU0MDY1NzA5OTMiLCJlbWFpbCI6ImFudGkuY2FuY2VyLmZpZ2h0ZXJAZ21haWwuY29tIiwiZW1haWxfdmVyaWZpZWQiOnRydWUsIm5iZiI6MTY5ODE5ODYzMCwibmFtZSI6InlhbmcgeWFuZyIsInBpY3R1cmUiOiJodHRwczovL2xoMy5nb29nbGV1c2VyY29udGVudC5jb20vYS9BQ2c4b2NKUjVFOGJoU2wxRUVtUDcyVXhPUExGU2VlNlcza2hlZVItVnVaZDZiei09czk2LWMiLCJnaXZlbl9uYW1lIjoieWFuZyIsImZhbWlseV9uYW1lIjoieWFuZyIsImxvY2FsZSI6ImVuIiwiaWF0IjoxNjk4MTk4OTMwLCJleHAiOjE2OTgyMDI1MzAsImp0aSI6ImJlNjYyNmYyMWMwZGE4MjgzZWUzMWEzMGU0ZGY2MWQ0Y2M3YzlkODgifQ.FddjhGbYiuFDFjWFmjGcAXNdbuR3XyqzqWWv8NvyJnREj48qhcFsI4HHv0DWnwewOsmr2jnCnC51nInSyBvgZJSjAQMdUAVpoZTwvroliR3wB6q8In29eXhZ2N_WgqVqoTcMT5yECF4oJrVZgDYxG5t17KAIYso3pvbpCy-5oVq8zBPZ8M8HRd83upKAODDC4bAKtGsGqlFYDV-bD7L1pXsRw3HWebcyY7bfk74FVtnveGyk-A0VIHjDIAdxxjqMiZmuntRX7PUV-nXhSeiBxzI8W4kjHO8uQKlwaCgjyOARTNQ2b-AOY5f5gr6BP0kin9DN6rk7QuPfrvVIgg4yGw

  • 相关阅读:
    mac火狐浏览器快速下载,aria2设置
    ssm+java+vue微信小程序的驾校预约管理系统#毕业设计
    Spring AOP 详解及@Trasactional
    NS2安装及入门实例——(ns2.35 / Ubuntu20.04)
    IOday1
    十六、一起学习Lua 文件 I/O
    线上又出问题了!又是特殊场景,哎呀,当时怎么没有想到!
    Nginx 文件名逻辑漏洞(CVE-2013-4547)(Vulhub)
    webservice原始调用(无论是java的webservice还是.net或是php的都可用)
    MySQL8安装步骤
  • 原文地址:https://blog.csdn.net/qq_52813185/article/details/134027916
  • 最新文章
  • 攻防演习之三天拿下官网站群
    数据安全治理学习——前期安全规划和安全管理体系建设
    企业安全 | 企业内一次钓鱼演练准备过程
    内网渗透测试 | Kerberos协议及其部分攻击手法
    0day的产生 | 不懂代码的"代码审计"
    安装scrcpy-client模块av模块异常,环境问题解决方案
    leetcode hot100【LeetCode 279. 完全平方数】java实现
    OpenWrt下安装Mosquitto
    AnatoMask论文汇总
    【AI日记】24.11.01 LangChain、openai api和github copilot
  • 热门文章
  • 十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!
    奉劝各位学弟学妹们,该打造你的技术影响力了!
    五年了,我在 CSDN 的两个一百万。
    Java俄罗斯方块,老程序员花了一个周末,连接中学年代!
    面试官都震惊,你这网络基础可以啊!
    你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法
    心情不好的时候,用 Python 画棵樱花树送给自己吧
    通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!
    13 万字 C 语言从入门到精通保姆级教程2021 年版
    10行代码集2000张美女图,Python爬虫120例,再上征途
Copyright © 2022 侵权请联系2656653265@qq.com    京ICP备2022015340号-1
正则表达式工具 cron表达式工具 密码生成工具

京公网安备 11010502049817号