1.基础性实验
R自带的数据集Titanic记录了泰坦尼克号上乘客的生存和死亡信息,该数据集包含船舱等级(class)、性别 (sex)、年龄(age)、生存状态(survived)四个类别变量。根据该数据集,生成以下频数分布表。
a) 生成sex和survived两个变量的二维列联表,并为列联表添加边际和
b) 生成class, sex, age和 survived四个变量的多维列联表
c) 将问题b)生成的列联表转化为带有类别频数的数据框
> data<-data.frame(Titanic)
> xtabs(Freq~Sex+Survived,data = data)
> addmargins(xtabs(Freq~Sex+Survived,data = data))
> tab<-xtabs(Freq~Class+Sex+Age+Survived,data = data)
> tab
> as.data.frame.array(tab)
2.验证性实验
代码清单7-3
library(Hmisc)
myvars <- c("mpg", "hp", "wt")
describe(mtcars[myvars])
代码清单7-4
library(pastecs)
myvars <- c("mpg", "hp", "wt")
stat.desc(mtcars[myvars])
代码清单7-11
library(grid)
library(vcd)
mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis)
mytable
ftable(mytable)
margin.table(mytable, 1)
margin.table(mytable, 2)
margin.table(mytable, 3)
margin.table(mytable, c(1,3))
ftable(prop.table(mytable, c(1,2)))
ftable(addmargins(prop.table(mytable, c(1, 2)), 3))
3.设计性实验
生成如下数据框df,数据的范围[1,20],并设定y2的第3个和第8个值为缺失值。调用Hmisc包中的describe()对数据框生成描述性统计量,观察实验结果。
> y1<-round(runif(10,1,20))
> y2<-round(runif(10,1,20))
> y2[c(3,8)]<-NA
> y3<-round(runif(10,1,20))
> df<-data.frame(y1,y2,y3)
> library(Hmisc)
> describe(df)
df
3 Variables 10 Observations
-----------------------------------------------------------------------------------------------------------------------------
y1
n missing distinct Info Mean Gmd
10 0 6 0.964 9.3 4.867
lowest : 2 5 7 8 11, highest: 5 7 8 11 15
Value 2 5 7 8 11 15
Frequency 1 1 1 2 3 2
Proportion 0.1 0.1 0.1 0.2 0.3 0.2
-----------------------------------------------------------------------------------------------------------------------------
y2
n missing distinct Info Mean Gmd
8 2 6 0.952 10.12 6.607
lowest : 4 5 7 16 17, highest: 5 7 16 17 18
Value 4 5 7 16 17 18
Frequency 1 1 3 1 1 1
Proportion 0.125 0.125 0.375 0.125 0.125 0.125
-------------------------------------------------------------------------------------------
y3
n missing distinct Info Mean Gmd
10 0 8 0.988 13.4 7.511
lowest : 2 6 7 11 14, highest: 11 14 18 19 20
Value 2 6 7 11 14 18 19 20
Frequency 1 1 1 1 1 2 2 1
Proportion 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1
-------------------------------------------------------------------------------------------
4.设计性实验
今测得10名非铅作业工人和7名铅作业工人的血铅值,如下表所示。试用Wilcoxon秩和检验分析两组工人血铅值有无差异。
> x<-c(24,26,29,34,43,58,63)
> y<-c(82,87,97,121,164,208,213)
> wilcox.test(x,y,alternative = "less",exact = FALSE,correct = FALSE)
H0:两组工人血铅值没有差异
H1:两组工人血铅值有差异
p=0.0008726<0.05,原假设不成立,备择假设成立,即两组工人血铅值有差异
5.将下表生成雷达图,雷达图形式不限。
数据集表示的含义为:7种比较算法的三种评价指标(AE of Best, AE of Mean, AE of worst)
a<-c(0.106,0.16,0.135)
b<-c(0.065,0.177,0.103)
c<-c(0.076,0.11,0.096)
d<-c(0.235,0.293,0.271)
e<-c(0.187,0.248,0.222)
f<-c(0.119,0.169,0.134)
g<-c(0.091,0.129,0.108)
df<-data.frame(a,b,c,d,e,f,g)
rownames(df)<-c("AE of Best","AE of Mean","AE of Worst")
colnames(df)<-c("MS","HLMS","BIWOA","BMMVO","BSCA","BHHA","BSSA")
library(fmsb)
max<-c(0.3)
min<-c(0.05)
df<-data.frame(rbind(max,min,df))
radarchart(df=df,seg = 7,
axistype = 1,
pcol = c("#00AFBB", "#E7B800","#3401c9"),
cglcol = "grey",
plty = 1,
plwd = 2,
pty = c(16,18),
cglty = 1,
cglwd = 0.8,
axislabcol = "grey",
vlcex = 0.7,
caxislabels = c(0,0.05,0.1,0.15,0.2,0.25,0.3))
legend(
x = "bottomleft",legend =rownames(df)[3:5] , horiz = TRUE,
bty = "n", pch = 20 , col = c("#00AFBB", "#E7B800","#3401c9"),
text.col = "black", cex = 1, pt.cex = 1.5
)
输出结果为: