• 实验八—基本统计分析(一)


    实验8 基本统计分析(一)

    1.基础性实验

    R自带的数据集Titanic记录了泰坦尼克号上乘客的生存和死亡信息,该数据集包含船舱等级(class)、性别 (sex)、年龄(age)、生存状态(survived)四个类别变量。根据该数据集,生成以下频数分布表。

    a) 生成sex和survived两个变量的二维列联表,并为列联表添加边际和

    b) 生成class, sex, age和 survived四个变量的多维列联表

    c) 将问题b)生成的列联表转化为带有类别频数的数据框

    > data<-data.frame(Titanic)
    > xtabs(Freq~Sex+Survived,data = data)
    > addmargins(xtabs(Freq~Sex+Survived,data = data)) 
    > tab<-xtabs(Freq~Class+Sex+Age+Survived,data = data)
    > tab
    > as.data.frame.array(tab)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    image-20221106172055990

    image-20221010104256817

    image-20221010104610032

    image-20221010104854354

    2.验证性实验

    代码清单7-3

    library(Hmisc)
    myvars <- c("mpg", "hp", "wt")
    describe(mtcars[myvars])
    
    • 1
    • 2
    • 3

    image-20221003113352737

    代码清单7-4

    library(pastecs)
    myvars <- c("mpg", "hp", "wt")
    stat.desc(mtcars[myvars])
    
    • 1
    • 2
    • 3

    image-20221003113943496

    代码清单7-11

    library(grid)
    library(vcd)
    mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis)
    mytable
    ftable(mytable) 
    margin.table(mytable, 1)
    margin.table(mytable, 2)
    margin.table(mytable, 3)
    margin.table(mytable, c(1,3))
    ftable(prop.table(mytable, c(1,2)))
    ftable(addmargins(prop.table(mytable, c(1, 2)), 3))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    image-20221003114311630

    image-20221003114459680

    3.设计性实验

    生成如下数据框df,数据的范围[1,20],并设定y2的第3个和第8个值为缺失值。调用Hmisc包中的describe()对数据框生成描述性统计量,观察实验结果。

    image-20221003114712865

    > y1<-round(runif(10,1,20))
    > y2<-round(runif(10,1,20))
    > y2[c(3,8)]<-NA
    > y3<-round(runif(10,1,20))
    > df<-data.frame(y1,y2,y3)
    > library(Hmisc)
    > describe(df)
    df 
    
     3  Variables      10  Observations
    -----------------------------------------------------------------------------------------------------------------------------
    y1 
           n  missing distinct     Info     Mean      Gmd 
          10        0        6    0.964      9.3    4.867 
    
    lowest :  2  5  7  8 11, highest:  5  7  8 11 15
                                      
    Value        2   5   7   8  11  15
    Frequency    1   1   1   2   3   2
    Proportion 0.1 0.1 0.1 0.2 0.3 0.2
    -----------------------------------------------------------------------------------------------------------------------------
    y2 
           n  missing distinct     Info     Mean      Gmd 
           8        2        6    0.952    10.12    6.607 
    
    lowest :  4  5  7 16 17, highest:  5  7 16 17 18
                                                  
    Value          4     5     7    16    17    18
    Frequency      1     1     3     1     1     1
    Proportion 0.125 0.125 0.375 0.125 0.125 0.125
    -------------------------------------------------------------------------------------------
    y3 
           n  missing distinct     Info     Mean      Gmd 
          10        0        8    0.988     13.4    7.511 
    
    lowest :  2  6  7 11 14, highest: 11 14 18 19 20
                                              
    Value        2   6   7  11  14  18  19  20
    Frequency    1   1   1   1   1   2   2   1
    Proportion 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1
    -------------------------------------------------------------------------------------------
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    image-20221003115112627

    image-20221003115125390

    4.设计性实验

    今测得10名非铅作业工人和7名铅作业工人的血铅值,如下表所示。试用Wilcoxon秩和检验分析两组工人血铅值有无差异。

    image-20221003115356171

    > x<-c(24,26,29,34,43,58,63)
    > y<-c(82,87,97,121,164,208,213)
    > wilcox.test(x,y,alternative = "less",exact = FALSE,correct = FALSE)
    
    • 1
    • 2
    • 3

    image-20221003115851123

    H0:两组工人血铅值没有差异

    H1:两组工人血铅值有差异

    p=0.0008726<0.05,原假设不成立,备择假设成立,即两组工人血铅值有差异

    5.将下表生成雷达图,雷达图形式不限。

    image-20221004230856350

    数据集表示的含义为:7种比较算法的三种评价指标(AE of Best, AE of Mean, AE of worst)

    a<-c(0.106,0.16,0.135)
    b<-c(0.065,0.177,0.103)
    c<-c(0.076,0.11,0.096)
    d<-c(0.235,0.293,0.271)
    e<-c(0.187,0.248,0.222)
    f<-c(0.119,0.169,0.134)
    g<-c(0.091,0.129,0.108)
    df<-data.frame(a,b,c,d,e,f,g)
    rownames(df)<-c("AE of Best","AE of Mean","AE of Worst")
    colnames(df)<-c("MS","HLMS","BIWOA","BMMVO","BSCA","BHHA","BSSA")
    library(fmsb)
    max<-c(0.3)
    min<-c(0.05)
    df<-data.frame(rbind(max,min,df))
    radarchart(df=df,seg = 7,
               axistype = 1,
               pcol = c("#00AFBB", "#E7B800","#3401c9"),
               cglcol = "grey",
               plty = 1,
               plwd = 2,
               pty = c(16,18),
               cglty = 1,
               cglwd = 0.8,
               axislabcol = "grey",
               vlcex = 0.7,
               caxislabels = c(0,0.05,0.1,0.15,0.2,0.25,0.3))
    legend(
      x = "bottomleft",legend =rownames(df)[3:5] , horiz = TRUE,
      bty = "n", pch = 20 , col = c("#00AFBB", "#E7B800","#3401c9"),
      text.col = "black", cex = 1, pt.cex = 1.5
    )
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32

    输出结果为:

    image-20221005001741845

  • 相关阅读:
    用 Python 算法识别股价支撑位和阻力位
    Maven环境配置
    JDK动态代理与Cglib动态代理使用详解
    网络安全(黑客)—自学笔记
    基本的Linux命令
    从Web2社交网络学习,为何要重视地位平等?
    ChatGPT基本原理
    prompt 提示词如何写?
    【前端学习】—Vuex(十八)
    c++基础:new函数
  • 原文地址:https://blog.csdn.net/W_chuanqi/article/details/127718233