岭回归(英文名:ridge regression, Tikhonov regularization)是一种专用于共线性数据分析的有偏估计回归方法,实质上是一种改良的最小二乘估计法,通过放弃最小二乘法的无偏性,以损失部分信息、降低精度为代价获得回归系数更为符合实际、更可靠的回归方法,对病态数据的拟合要强于最小二乘法。
迄今为止,许多遗传变异已被确定为与不同的表型性状相关。然而,已确定的关联通常只能解释性状遗传性的一小部分,并且仅包含已知相关变异的模型的预测能力很小。多元回归是一种流行的框架,它同时考虑许多遗传变异的联合效应。由于遗传数据的高维数和预测因子之间的相关结构,普通多元回归很少适用于遗传数据。人们重新对使用惩罚性回归技术来规避这些困难产生了兴趣。在本文中着重于岭回归,一种惩罚回归方法,已被证明在多变量预测中提供良好的性能问题。岭回归应用中的一个挑战是控制量的岭参数的选择收缩的回归系数。我们提出了一种基于数据确定脊形参数的方法。
目的:在高维预测问题中获得良好的性能。为我们的方法建立了理论依据,并在模拟遗传数据和实际数据实例上验证了其性能。岭回归模型拟合数十万到数百万的基因变异同时存在,这给计算带来了挑战。
ridge 软件包安装非常简单,直接install.packages()即可。
- install.packages("ridge")
- devtools::install_github("selva86/InformationValue")
输入:自变量X至少一项或以上的定量变量或二分类定类变量,因变量Y要求为定量变量(若为定类变量,请使用逻辑回归)。 输出:模型检验优度的结果,自变量对因变量的线性关系等等。
- library(ridge)
- data(GenCont)
- head(GenCont[, 1:10])
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9
- ## [1,] 1.4356316 1 1 1 0 0 1 0 0 0
- ## [2,] 2.9226960 1 0 0 0 1 0 0 1 0
- ## [3,] 0.5669319 0 0 0 0 0 0 0 0 0
- ## [4,] 4.8515051 1 0 0 0 1 0 0 0 0
- ## [5,] 0.1525582 1 1 1 0 0 1 0 0 0
- ## [6,] 2.9522701 1 0 0 0 1 0 0 0 0
- data(GenBin)
- head(GenBin)
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12
- ## [1,] 0 1 0 0 0 0 0 0 1 0 1 0 0
- ## [2,] 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## [3,] 1 0 0 0 0 0 0 2 0 1 0 0 1
- ## [4,] 0 0 0 0 0 0 1 0 0 0 0 0 0
- ## [5,] 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## [6,] 0 0 0 0 0 0 0 0 0 0 0 0 0
- ## SNP13 SNP14
- ## [1,] 0 1
- ## [2,] 1 0
- ## [3,] 1 0
- ## [4,] 0 0
- ## [5,] 0 0
- ## [6,] 0 0
这个我们采用glmnet里面的函数和参数解决实际的问题,也可以使用R包ridge,常规数据我们可以模仿Lasso回归的模式来做。
这个数据集经常使用,在分享的机器学习里面的例子中经常出现,并且有对数据进行处理的常规步骤:
- library(glmnet)
- library(caret)
- BreastCancer <- read.csv("wisc_bc_data.csv", stringsAsFactors = FALSE)
- BreastCancer = BreastCancer[, -1]
- dim(BreastCancer)
- ## [1] 568 31
- str(BreastCancer)
- ## 'data.frame': 568 obs. of 31 variables:
- ## $ diagnosis : chr "M" "M" "M" "M" ...
- ## $ radius_mean : num 20.6 19.7 11.4 20.3 12.4 ...
- ## $ texture_mean : num 17.8 21.2 20.4 14.3 15.7 ...
- ## $ perimeter_mean : num 132.9 130 77.6 135.1 82.6 ...
- ## $ area_mean : num 1326 1203 386 1297 477 ...
- ## $ smoothness_mean : num 0.0847 0.1096 0.1425 0.1003 0.1278 ...
- ## $ compactne_mean : num 0.0786 0.1599 0.2839 0.1328 0.17 ...
- ## $ concavity_mean : num 0.0869 0.1974 0.2414 0.198 0.1578 ...
- ## $ concave_points_mean : num 0.0702 0.1279 0.1052 0.1043 0.0809 ...
- ## $ symmetry_mean : num 0.181 0.207 0.26 0.181 0.209 ...
- ## $ fractal_dimension_mean : num 0.0567 0.06 0.0974 0.0588 0.0761 ...
- ## $ radius_se : num 0.543 0.746 0.496 0.757 0.335 ...
- ## $ texture_se : num 0.734 0.787 1.156 0.781 0.89 ...
- ## $ perimeter_se : num 3.4 4.58 3.44 5.44 2.22 ...
- ## $ area_se : num 74.1 94 27.2 94.4 27.2 ...
- ## $ smoothness_se : num 0.00522 0.00615 0.00911 0.01149 0.00751 ...
- ## $ compactne_se : num 0.0131 0.0401 0.0746 0.0246 0.0335 ...
- ## $ concavity_se : num 0.0186 0.0383 0.0566 0.0569 0.0367 ...
- ## $ concave_points_se : num 0.0134 0.0206 0.0187 0.0188 0.0114 ...
- ## $ symmetry_se : num 0.0139 0.0225 0.0596 0.0176 0.0216 ...
- ## $ fractal_dimension_se : num 0.00353 0.00457 0.00921 0.00511 0.00508 ...
- ## $ radius_worst : num 25 23.6 14.9 22.5 15.5 ...
- ## $ texture_worst : num 23.4 25.5 26.5 16.7 23.8 ...
- ## $ perimeter_worst : num 158.8 152.5 98.9 152.2 103.4 ...
- ## $ area_worst : num 1956 1709 568 1575 742 ...
- ## $ smoothness_worst : num 0.124 0.144 0.21 0.137 0.179 ...
- ## $ compactne_worst : num 0.187 0.424 0.866 0.205 0.525 ...
- ## $ concavity_worst : num 0.242 0.45 0.687 0.4 0.535 ...
- ## $ concave_points_worst : num 0.186 0.243 0.258 0.163 0.174 ...
- ## $ symmetry_worst : num 0.275 0.361 0.664 0.236 0.399 ...
- ## $ fractal_dimension_worst: num 0.089 0.0876 0.173 0.0768 0.1244 ...
- table(BreastCancer$diagnosis)
- ##
- ## B M
- ## 357 211
- # 删除方差为0的变量
- zerovar = nearZeroVar(BreastCancer[, -1])
- zerovar
- ## integer(0)
- # 首先删除强相关的变量
- descrCorr = cor(BreastCancer[, -1])
- descrCorr[1:5, 1:5]
- ## radius_mean texture_mean perimeter_mean area_mean
- ## radius_mean 1.0000000 0.32938305 0.9978764 0.9873442
- ## texture_mean 0.3293830 1.00000000 0.3359176 0.3261929
- ## perimeter_mean 0.9978764 0.33591759 1.0000000 0.9865482
- ## area_mean 0.9873442 0.32619289 0.9865482 1.0000000
- ## smoothness_mean 0.1680940 -0.01776898 0.2045046 0.1748380
- ## smoothness_mean
- ## radius_mean 0.16809398
- ## texture_mean -0.01776898
- ## perimeter_mean 0.20450464
- ## area_mean 0.17483805
- ## smoothness_mean 1.00000000
- highCorr = findCorrelation(descrCorr, 0.9)
- BreastCancer = BreastCancer[, -(highCorr + 1)]
- # 随后解决多重共线性,本例中不存在多重共线性问题
- comboInfo = findLinearCombos(BreastCancer[, -1])
- Process = preProcess(BreastCancer)
- BreastCancer = predict(Process, BreastCancer)
- library(tidyverse)
- library(sampling)
- set.seed(123)
- # 每层抽取70%的数据
- train_id <- strata(BreastCancer, "diagnosis", size = rev(round(table(BreastCancer$diagnosis) *
- 0.7)))$ID_unit
- # 训练数据
- trainData <- BreastCancer[train_id, ]
- # 测试数据
- testData <- BreastCancer[-train_id, ]
-
- # 查看训练、测试数据中正负样本比例
- prop.table(table(trainData$diagnosis))
- ##
- ## B M
- ## 0.6281407 0.3718593
- prop.table(table(testData$diagnosis))
- ##
- ## B M
- ## 0.6294118 0.3705882
- prop.table(table(BreastCancer$diagnosis))
- ##
- ## B M
- ## 0.6285211 0.3714789
- #4. How to visualize the importance of variables using featurePlot()
- featurePlot(x = trainData[, 2:21],
- y = as.factor(trainData$diagnosis),
- plot = "box", #For classification:box, strip, density, pairs or ellipse. For regression, pairs or scatter
- strip=strip.custom(par.strip.text=list(cex=.7)),
- scales = list(x = list(relation="free"),
- y = list(relation="free"))
- )
- x = as.matrix(trainData[, -1])
- y = ifelse(trainData$diagnosis == "M", 1, 0)
- ridge <- glmnet(x, y, family = "binomial", alpha = 0)
- plot(ridge, xvar = "lambda", label = TRUE)
- ridge_model <- cv.glmnet(x, y, alpha = 0, nfolds = 3)
- plot(ridge_model)
plot(ridge_model$glmnet.fit, "lambda", label = T)
在cv.glmnet()函数的输出结果中,包含一个lambda.min取值,且ridge_model$lambda.min = 0.15,该值为在模型均方误差最小的情况下参数lambda的取值,可以使用该值建立Ridge回归模型。下面使用该参数对全部的训练数据集建立Ridge模型。
- ridge_min <- ridge_model$lambda.min
- ## 使用ridge_min 拟合ridge模型
- ridge_best <- glmnet(x, y, alpha = 0, lambda = ridge_min)
- coef(ridge_best)
- ## 21 x 1 sparse Matrix of class "dgCMatrix"
- ## s0
- ## (Intercept) 0.3729332067
- ## area_mean 0.0885358564
- ## smoothness_mean 0.0052045287
- ## compactne_mean -0.0003435624
- ## symmetry_mean 0.0075012310
- ## fractal_dimension_mean -0.0839607648
- ## radius_se 0.0636875678
- ## texture_se 0.0027998059
- ## smoothness_se 0.0388690882
- ## compactne_se -0.0551225591
- ## concavity_se -0.0524564964
- ## concave_points_se 0.0598000796
- ## symmetry_se -0.0177510728
- ## fractal_dimension_se 0.0119540616
- ## texture_worst 0.0557146338
- ## smoothness_worst 0.0310525197
- ## compactne_worst -0.0004504186
- ## concavity_worst 0.0892093379
- ## concave_points_worst 0.1296407274
- ## symmetry_worst 0.0614660535
- ## fractal_dimension_worst 0.0430986166
- ridge.coef <- predict(ridge_model, s = 0.05, type = "coefficients")
- ridge.coef
- ## 21 x 1 sparse Matrix of class "dgCMatrix"
- ## s1
- ## (Intercept) 0.3729040858
- ## area_mean 0.0894265533
- ## smoothness_mean 0.0055832621
- ## compactne_mean 0.0009547588
- ## symmetry_mean 0.0076406106
- ## fractal_dimension_mean -0.0794539760
- ## radius_se 0.0632058362
- ## texture_se 0.0022559684
- ## smoothness_se 0.0356625085
- ## compactne_se -0.0512845944
- ## concavity_se -0.0465064056
- ## concave_points_se 0.0570207458
- ## symmetry_se -0.0175198162
- ## fractal_dimension_se 0.0076460181
- ## texture_worst 0.0559848075
- ## smoothness_worst 0.0332178942
- ## compactne_worst 0.0054483603
- ## concavity_worst 0.0835264079
- ## concave_points_worst 0.1249131201
- ## symmetry_worst 0.0592973698
- ## fractal_dimension_worst 0.0381440680
- ridge.y <- predict(ridge_model, newx = as.matrix(testData[, -1]), type = "response",
- s = 0.05)
- library(InformationValue)
- actuals <- ifelse(testData$diagnosis == "M", 1, 0)
- misClassError(actuals, ridge.y)
- ## [1] 0.0588
- plotROC(actuals, ridge.y)
这样的数据我们使用特殊的软件包 ridge 来做,非常方便,并且里面内置了基因型预测函数,非常方便对基因型和表型的数据进行处理。
- data(GenBin)
- head(GenBin)
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12
- ## [1,] 0 1 0 0 0 0 0 0 1 0 1 0 0
- ## [2,] 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## [3,] 1 0 0 0 0 0 0 2 0 1 0 0 1
- ## [4,] 0 0 0 0 0 0 1 0 0 0 0 0 0
- ## [5,] 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## [6,] 0 0 0 0 0 0 0 0 0 0 0 0 0
- ## SNP13 SNP14
- ## [1,] 0 1
- ## [2,] 1 0
- ## [3,] 1 0
- ## [4,] 0 0
- ## [5,] 0 0
- ## [6,] 0 0
- table(GenBin[, 1])
- ##
- ## 0 1
- ## 250 250
- beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
- summary(beta_logisticRidge)
- ##
- ## Call:
- ## logisticRidge(formula = Phenotypes ~ ., data = as.data.frame(GenBin))
- ##
- ##
- ## Coefficients:
- ## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
- ## (Intercept) 0.002782 NA NA NA
- ## SNP1 0.166067 1.914369 0.666082 2.874
- ## SNP2 0.110745 0.269635 0.963936 0.280
- ## SNP3 -0.284802 -0.491809 0.999027 -0.492
- ## SNP4 -1.007615 -1.422131 0.990998 -1.435
- ## SNP5 0.911082 0.910170 0.986458 0.923
- ## SNP6 -0.026877 -0.355500 0.959424 -0.371
- ## SNP7 -0.068540 -1.022005 0.922686 -1.108
- ## SNP8 0.149821 1.729257 0.667675 2.590
- ## SNP9 -0.159520 -0.795170 0.948859 -0.838
- ## SNP10 -0.207470 -1.214400 0.893062 -1.360
- ## SNP11 0.027842 0.067788 0.999575 0.068
- ## SNP12 0.090563 0.651787 0.958376 0.680
- ## SNP13 -0.051310 -0.705257 0.947386 -0.744
- ## SNP14 -0.196030 -0.990720 0.904846 -1.095
- ## Pr(>|t|)
- ## (Intercept) NA
- ## SNP1 0.00405 **
- ## SNP2 0.77969
- ## SNP3 0.62252
- ## SNP4 0.15127
- ## SNP5 0.35618
- ## SNP6 0.71098
- ## SNP7 0.26802
- ## SNP8 0.00960 **
- ## SNP9 0.40202
- ## SNP10 0.17389
- ## SNP11 0.94593
- ## SNP12 0.49644
- ## SNP13 0.45662
- ## SNP14 0.27356
- ## ---
- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- ##
- ## Ridge paramter: 0.1244006, chosen automatically, computed using 3 PCs
- ##
- ## Degrees of freedom: model 7.02 , variance 4.066
- pvalsMod <- pvals(beta_logisticRidge)
- print(pvalsMod)
- ##
- ## lambda 0.124401 chosen automatically using 3 PCs
- ##
- ## Estimate (scaled) Std. Error (scaled) t value (scaled) Pr(>|t|)
- ## SNP1 1.91437 0.66608 2.874 0.00405 **
- ## SNP2 0.26964 0.96394 0.280 0.77969
- ## SNP3 -0.49181 0.99903 -0.492 0.62252
- ## SNP4 -1.42213 0.99100 -1.435 0.15127
- ## SNP5 0.91017 0.98646 0.923 0.35618
- ## SNP6 -0.35550 0.95942 -0.371 0.71098
- ## SNP7 -1.02201 0.92269 -1.108 0.26802
- ## SNP8 1.72926 0.66767 2.590 0.00960 **
- ## SNP9 -0.79517 0.94886 -0.838 0.40202
- ## SNP10 -1.21440 0.89306 -1.360 0.17389
- ## SNP11 0.06779 0.99958 0.068 0.94593
- ## SNP12 0.65179 0.95838 0.680 0.49644
- ## SNP13 -0.70526 0.94739 -0.744 0.45662
- ## SNP14 -0.99072 0.90485 -1.095 0.27356
- ## ---
- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- plot(pvalsMod)
- pred_phen <- predict(beta_logisticRidge, type = "response")
- res <- cbind(GenBin, pred_phen)
- head(res)
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12
- ## 1 0 1 0 0 0 0 0 0 1 0 1 0 0
- ## 2 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## 3 1 0 0 0 0 0 0 2 0 1 0 0 1
- ## 4 0 0 0 0 0 0 1 0 0 0 0 0 0
- ## 5 0 0 0 0 0 0 0 1 0 0 0 0 0
- ## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
- ## SNP13 SNP14
- ## 1 0 1 0.4788050
- ## 2 1 0 0.4707666
- ## 3 1 0 0.4367005
- ## 4 0 0 0.4939766
- ## 5 0 0 0.4835665
- ## 6 0 0 0.5006956
- genotypesfile <- system.file("extdata", "GenBin_genotypes.txt", package = "ridge")
- phenotypesfile <- system.file("extdata", "GenBin_phenotypes.txt", package = "ridge")
- beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile,
- phenotypesfilename = phenotypesfile)
- beta_logisticRidgeGenotypes
- ## B
- ## Intercept 0.002782
- ## SNP1 0.166067
- ## SNP2 0.110745
- ## SNP3 -0.284802
- ## SNP4 -1.007615
- ## SNP5 0.911082
- ## SNP6 -0.026877
- ## SNP7 -0.068540
- ## SNP8 0.149821
- ## SNP9 -0.159520
- ## SNP10 -0.207470
- ## SNP11 0.027842
- ## SNP12 0.090563
- ## SNP13 -0.051310
- ## SNP14 -0.196030
- ## compare to output of logisticRidge
- results <- as.data.frame(cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes))
- results
- ## round(coef(beta_logisticRidge), 6) B
- ## (Intercept) 0.002782 0.002782
- ## SNP1 0.166067 0.166067
- ## SNP2 0.110745 0.110745
- ## SNP3 -0.284802 -0.284802
- ## SNP4 -1.007615 -1.007615
- ## SNP5 0.911082 0.911082
- ## SNP6 -0.026877 -0.026877
- ## SNP7 -0.068540 -0.068540
- ## SNP8 0.149821 0.149821
- ## SNP9 -0.159520 -0.159520
- ## SNP10 -0.207470 -0.207470
- ## SNP11 0.027842 0.027842
- ## SNP12 0.090563 0.090563
- ## SNP13 -0.051310 -0.051310
- ## SNP14 -0.196030 -0.196030
- data(GenCont)
- head(GenCont)
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12
- ## [1,] 1.4356316 1 1 1 0 0 1 0 0 0 0 1 0
- ## [2,] 2.9226960 1 0 0 0 1 0 0 1 0 0 1 0
- ## [3,] 0.5669319 0 0 0 0 0 0 0 0 0 0 2 0
- ## [4,] 4.8515051 1 0 0 0 1 0 0 0 0 0 1 0
- ## [5,] 0.1525582 1 1 1 0 0 1 0 0 0 0 1 0
- ## [6,] 2.9522701 1 0 0 0 1 0 0 0 0 0 0 0
- beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
- summary(beta_linearRidge)
- ##
- ## Call:
- ## linearRidge(formula = Phenotypes ~ ., data = as.data.frame(GenCont))
- ##
- ##
- ## Coefficients:
- ## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
- ## (Intercept) 1.533386 NA NA NA
- ## SNP1 0.277296 4.045409 0.266120 15.201
- ## SNP2 -0.110458 -1.256154 0.216332 5.807
- ## SNP3 -0.110458 -1.256154 0.216332 5.807
- ## SNP4 0.005230 0.011635 0.371693 0.031
- ## SNP5 0.531173 6.323006 0.315368 20.050
- ## SNP6 -0.119164 -1.373227 0.223047 6.157
- ## SNP7 0.113844 0.113730 0.372181 0.306
- ## SNP8 -0.099149 -1.028581 0.355807 2.891
- ## SNP9 -0.008321 -0.008312 0.372386 0.022
- ## SNP10 0.058562 0.101128 0.371567 0.272
- ## SNP11 -0.096526 -1.495699 0.329250 4.543
- ## SNP12 -0.334279 -0.333945 0.372248 0.897
- ## Pr(>|t|)
- ## (Intercept) NA
- ## SNP1 < 2e-16 ***
- ## SNP2 6.38e-09 ***
- ## SNP3 6.38e-09 ***
- ## SNP4 0.97503
- ## SNP5 < 2e-16 ***
- ## SNP6 7.43e-10 ***
- ## SNP7 0.75993
- ## SNP8 0.00384 **
- ## SNP9 0.98219
- ## SNP10 0.78549
- ## SNP11 5.55e-06 ***
- ## SNP12 0.36966
- ## ---
- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- ##
- ## Ridge parameter: 2.206848, chosen automatically, computed using 1 PCs
- ##
- ## Degrees of freedom: model 3.121 , variance 1.205 , residual 5.036
- pvalsMod <- pvals(beta_linearRidge)
- print(pvalsMod)
- ##
- ## lambda 2.206848 chosen automatically using 1 PCs
- ##
- ## Estimate (scaled) Std. Error (scaled) t value (scaled) Pr(>|t|)
- ## SNP1 4.045409 0.266120 15.201 < 2e-16 ***
- ## SNP2 -1.256154 0.216332 5.807 6.38e-09 ***
- ## SNP3 -1.256154 0.216332 5.807 6.38e-09 ***
- ## SNP4 0.011635 0.371693 0.031 0.97503
- ## SNP5 6.323006 0.315368 20.050 < 2e-16 ***
- ## SNP6 -1.373227 0.223047 6.157 7.43e-10 ***
- ## SNP7 0.113730 0.372181 0.306 0.75993
- ## SNP8 -1.028581 0.355807 2.891 0.00384 **
- ## SNP9 -0.008312 0.372386 0.022 0.98219
- ## SNP10 0.101128 0.371567 0.272 0.78549
- ## SNP11 -1.495699 0.329250 4.543 5.55e-06 ***
- ## SNP12 -0.333945 0.372248 0.897 0.36966
- ## ---
- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- plot(pvalsMod)
- pred_phen <- predict(beta_linearRidge)
- res <- cbind(GenCont, pred_phen)
- head(res)
- ## Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12
- ## 1 1.4356316 1 1 1 0 0 1 0 0 0 0 1 0
- ## 2 2.9226960 1 0 0 0 1 0 0 1 0 0 1 0
- ## 3 0.5669319 0 0 0 0 0 0 0 0 0 0 2 0
- ## 4 4.8515051 1 0 0 0 1 0 0 0 0 0 1 0
- ## 5 0.1525582 1 1 1 0 0 1 0 0 0 0 1 0
- ## 6 2.9522701 1 0 0 0 1 0 0 0 0 0 0 0
- ## pred_phen
- ## 1 1.374077
- ## 2 2.146179
- ## 3 1.340333
- ## 4 2.245328
- ## 5 1.374077
- ## 6 2.341855
- genotypesfile <- system.file("extdata", "GenCont_genotypes.txt", package = "ridge")
- phenotypesfile <- system.file("extdata", "GenCont_phenotypes.txt", package = "ridge")
- betafile <- tempfile(pattern = "beta", fileext = ".dat")
- beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
- phenotypesfilename = phenotypesfile, betafilename = betafile)
- pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile,
- betafilename = betafile)
-
- ## compare to output of linearRidge
- results <- cbind(pred_phen_geno, pred_phen)
- head(results)
- ## PredictedPhenotypes pred_phen
- ## 1 1.374076 1.374077
- ## 2 2.146180 2.146179
- ## 3 1.340334 1.340333
- ## 4 2.245329 2.245328
- ## 5 1.374076 1.374077
- ## 6 2.341855 2.341855
Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1
Cule, Erika, and Maria De Iorio. "Ridge regression in prediction problems: automatic choice of the ridge parameter." Genetic epidemiology 37.7 (2013): 704-714.