1

I'm working in healthcare and I need help on how to use R. I explain: I have a set of data like that:

S1      S2      S3      S4      S5
0.498   1.48    1.43    0.536   0.548
2.03    1.7     3.74    2.13    2.02
0.272   0.242   0.989   0.534   0.787
0.986   2.03    2.53    1.65    2.31
0.307   0.934   0.633   0.36    0.281
0.78    0.76    0.706   0.81    1.11
0.829   2.03    0.667   1.48    1.42
0.497   1.27    0.952   1.23    1.73
0.553   0.286   0.513   0.422   0.573

Here are my objectives:

Do correlation between every column
Calculate p-values
Calculate R-squared
Only show when R2>0.5 and p-values <0.05 

Here is my code so far (it's not the most efficient but it work):

> e<-read.table(‘Workbook8nm.csv’, header=TRUE, sep=“,”, dec=“.”, na.strings=“NA”)
> f<-data.frame(e)
> M<-cor(f, use=“complete”) #Do the correlation like I want
> library(‘psych’)
> N<-corr.test (f) #Give me p-values

So, so far I have my correlation in M and my p-values in N. I need help on how to show R2 ?

And second part how to make R only show me when R2>0.5 and p-values<0.05 for example ? I used this line :

P<-M[which(m>0.9))] 

To show me only when the pearson coefficent is more than 0.9 as a training. But it just make me a list of every values that are superior to 0.9 ... So I don't know between which and which column this coefficient come from. The best would be that it show me significant values in a table with the name of column so after I can easily identify them. The reason I want to do that is because by table is 570 by 570 so I can't look at every p-values to keep only the significant one.

I hope I was clear ! It's my first post here, tell me if I did any mistake !

Thanks for your help !

4

1 回答 1

0

我确信在 R 空间中的某个地方有一个函数可以更快地执行此操作,但我编写了一个快速函数来将矩阵扩展为data.frame以“行”和“列”作为列,将值作为第三列的矩阵。

matrixToFrame <- function(m, name) {
    e <- expand.grid(row=rownames(m), col=colnames(m))
    e[name] <- as.vector(m)
    e
}

我们可以将相关矩阵转换为数据框,如下所示:

> matrixToFrame(cor(f), "cor")
   row col       cor
1   S1  S1 1.0000000
2   S2  S1 0.5322052
3   S3  S1 0.8573687
4   S4  S1 0.8542438
5   S5  S1 0.6820144
6   S1  S2 0.5322052
....

我们可以合并 和 的结果,corr.test因为cor列匹配

> b <- merge(matrixToFrame(corr.test(a)$p, "p"), matrixToFrame(cor(a), "cor"))
> head(b)
   row col            p       cor
1   S1  S1 0.0000000000 1.0000000
2   S1  S2 0.2743683745 0.5322052
3   S1  S3 0.0281656707 0.8573687
4   S1  S4 0.0281656707 0.8542438
5   S1  S5 0.2134783039 0.6820144
6   S2  S1 0.1402243214 0.5322052

然后我们可以过滤出我们想要的元素

> b[b$cor > .5 & b$p > .2,]
   row col         p       cor
2   S1  S2 0.2743684 0.5322052
5   S1  S5 0.2134783 0.6820144
8   S2  S3 0.2743684 0.5356585
10  S2  S5 0.2134783 0.6724486
15  S3  S5 0.2134783 0.6827349

编辑:我发现R 矩阵到 rownames colnames values,它提供了一些尝试matrixToFrame;不过,没有什么比我这里的更优雅了。

EDIT2:确保仔细阅读文档corr.test——看起来不同的信息被编码在上下对角线(?)中,所以这里的结果可能具有欺骗性。您可能希望在最终过滤步骤之前lower.tri或之前进行一些过滤。upper.tri

于 2015-08-13T15:11:20.827 回答