• 使用pixy计算群体遗传学统计量


    1 数据过滤

    过滤参数:过滤掉次等位基因频率(minor allele frequency,MAF)低于0.05、哈达-温伯格平衡(Hardy– Weinberg equilibrium,HWE)对应的P值低于1e-10或杂合率(heterozygosity rates)偏差过大(± 3 SD)的位点:
    去除杂合率(heterozygosity rates)偏差过大(± 3 SD)的个体:
    假设,基于Plink计算结果,需要移除sample1(高杂合或低杂合):

    #vcftools version:
    nohup vcftools --vcf snps_filtered.vcf --remove-indels --maf 0.05 --hwe 1e-10 --max-missing 0.8 --min-meanDP 20 --max-meanDP 500 --remove-indv sample1 --recode --stdout > snps_maf0_05_hwe1e-10_missing0_8.vcf &
    
    
    • 1
    • 2
    • 3

    vcftools生成的文件中会包含命令行输出,使用sed移除:

    nohup sed -i '1,30d' snps_maf0_05_hwe1e-10_missing0_8.vcf &
    
    • 1

    压缩:

    bgzip snps_maf0_05_hwe1e-10_missing0_8.vcf
    tabix snps_maf0_05_hwe1e-10_missing0_8.vcf.gz
    
    • 1
    • 2

    2 计算 F S T 、 D X Y 、 P i F_{ST}、D_{XY}、Pi FSTDXYPi

    安装软件包

    
    nohup pixy --stats pi fst dxy --vcf snps_maf0_05_hwe1e-10_missing0_8.vcf.gz --populations pop.txt --window_size 10000 --bypass_invariant_check 'yes' --n_cores 15 --output_folder results &
    
    
    • 1
    • 2
    • 3

    3 可视化

    可视化之前需要将染色体编号替换为数值:

    bash ~/gaoyue/GWAs/script/chr_tran.sh raw_results/pixy_dxy.txt results/pixy_dxy.txt
    bash ~/gaoyue/GWAs/script/chr_tran.sh raw_results/pixy_fst.txt results/pixy_fst.txt
    bash ~/gaoyue/GWAs/script/chr_tran.sh raw_results/pixy_pi.txt results/pixy_pi.txt
    
    • 1
    • 2
    • 3
    #load packages:
    library(ggplot2)
    library(tidyverse)
    
    #---------------------------------------------------------------------------------#
    #             1.define a function to convert the pixy outputs                     #
    #---------------------------------------------------------------------------------#
    pixy_to_long <- function(pixy_files){
    
      pixy_df <- list()
    
      for(i in 1:length(pixy_files)){
    
        stat_file_type <- gsub(".*_|.txt", "", pixy_files[i])
    
        if(stat_file_type == "pi"){
    
          df <- read_delim(pixy_files[i], delim = "\t")
          df <- df %>%
            gather(-pop, -window_pos_1, -window_pos_2, -chromosome,
                   key = "statistic", value = "value") %>%
            rename(pop1 = pop) %>%
            mutate(pop2 = NA)
    
          pixy_df[[i]] <- df
    
    
        } else{
    
          df <- read_delim(pixy_files[i], delim = "\t")
          df <- df %>%
            gather(-pop1, -pop2, -window_pos_1, -window_pos_2, -chromosome,
                   key = "statistic", value = "value")
          pixy_df[[i]] <- df
    
        }
    
      }
    
      bind_rows(pixy_df) %>%
        arrange(pop1, pop2, chromosome, window_pos_1, statistic)
    
    }
    
    #---------------------------------------------------------------------------------#
    #                      2.use new function we just defined:                        #
    #---------------------------------------------------------------------------------#
    ## Rcau则替换为对应的文件夹
    pixy_folder <- "/nfs_fs/nfs3/gaoyue/gaoyue/Fst/Rdeb_Fst/results/"
    pixy_files <- list.files(pixy_folder, full.names = TRUE)
    pixy_df <- pixy_to_long(pixy_files)
    
    #---------------------------------------------------------------------------------#
    #                                      3.plot:                                    #
    #---------------------------------------------------------------------------------#
    # create a custom labeller for special characters in pi/dxy/fst
    pixy_labeller <- as_labeller(c(avg_pi = "pi",
                                 avg_dxy = "D[XY]",
                                 avg_wc_fst = "F[ST]"),
                                 default = label_parsed)
    
    # plotting summary statistics across all chromosomes
    pixy_df %>%
      mutate(chrom_color_group = case_when(as.numeric(chromosome) %% 2 != 0 ~ "even",
                                     chromosome == "X" ~ "even",
                                     TRUE ~ "odd" )) %>%
      mutate(chromosome = factor(chromosome, levels = c(1:22, "X", "Y"))) %>%
      filter(statistic %in% c("avg_pi", "avg_dxy", "avg_wc_fst")) %>%
      ggplot(aes(x = (window_pos_1 + window_pos_2)/2, y = value, color = chrom_color_group))+
      geom_point(size = 0.5, alpha = 0.5, stroke = 0)+
      facet_grid(statistic ~ chromosome,
                 scales = "free_y", switch = "x", space = "free_x",
                 labeller = labeller(statistic = pixy_labeller,
                                     value = label_value))+
      xlab("Chromsome")+
      ylab("Statistic Value")+
      scale_color_manual(values = c("grey50", "black"))+
      theme_classic()+
      theme(axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            panel.spacing = unit(0.1, "cm"),
            strip.background = element_blank(),
            strip.placement = "outside",
            legend.position ="none")+
      scale_x_continuous(expand = c(0, 0)) +
      scale_y_continuous(expand = c(0, 0), limits = c(0,NA))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86

    在这里插入图片描述

    Ending!

  • 相关阅读:
    P10 属性分组
    c++之new和delete
    网络基础--1.网络纵横
    IDEA创建Springboot多模块项目
    Qt在qml组件中传递自定义类对象或结构体到cpp对象
    中国车用氮氧传感器行业研究与投资前景报告(2022版)
    线路横断面测量坐标转换程序
    简析CloudCompare文件夹之间的关系
    visual studio开发过程中常见操作
    你能懂的 Reflect 反射
  • 原文地址:https://blog.csdn.net/SUN5_The_answer/article/details/134429202