0

我想使用 gowdis 函数来查找两个数据样本之间的距离,但是这个函数会产生 NAN 值。我的数据类似于以下。

df1> 
  A      B      C      D      E      F      G      H      I      J 
 tcp   http    198     4      4      0     246     1      0      0
df2>
    A    B      C   D  E F  G  H I J
    1 tcp http 145  1  1 0 255 1 0 0
    2 tcp http 207 11 11 0 255 1 0 0
    3 tcp http 296 10 10 0 255 1 0 0
    4 tcp http 212  9  9 0 255 1 0 0

而我写的代码如下

W<-nrow(df1)+1
E<-nrow(df1)+nrow(df2)
mixusefull2<- rbind(df1,df2)
dist_mixusefull2<- as.matrix(FD::gowdis(mixusefull2))
idxs_mixusefull2 <- KernelKnn::distMat.knn.index.dist(dist_mixusefull2, 
TEST_indices = c(W:E), k = 1, threads = 1, minimize = TRUE)

谢谢您的帮助

4

1 回答 1

0

样本数据

df1 = as.data.frame(matrix(c('tcp','http', 198, 4, 4, 0, 246, 1, 0, 0), 

                    nrow = 1, ncol = 10), stringsAsFactors = TRUE)

colnames(df1) = toupper(letters[1:10])

df1



df2 = as.data.frame(matrix(c(rep('tcp', 4), rep('http', 4), c(145, 207, 296, 212), 

                           c(1,11,10,9), c(1,11,10,9), rep(0, 4), rep(255, 4), 

                           rep(1, 4), rep(0, 4), rep(0, 4)), nrow = 4, ncol = 10, 

                           byrow = FALSE), stringsAsFactors = TRUE)

colnames(df2) = toupper(letters[1:10])

df2

W<-nrow(df1)+1

E<-nrow(df1)+nrow(df2)

mixusefull2<- rbind(df1,df2)

str(mixusefull2)

我将第 3 列到第 10 列转换为数字。我不知道您的数据的用例,但您应该知道FD::gowdis函数接受 numericorderedfactor变量(您可以使用?FD::gowdis 看看)。因此,您可能必须相应地调整列的类型。

for (col in 3:10) {

  mixusefull2[, col] = as.numeric(mixusefull2[, col])
}

str(mixusefull2)


dist_mixusefull2<- as.matrix(FD::gowdis(mixusefull2))

dist_mixusefull2

idxs_mixusefull2 <- KernelKnn::distMat.knn.index.dist(dist_mixusefull2, 

                        TEST_indices = c(W:E), k = 1, threads = 1, minimize = TRUE)

idxs_mixusefull2

$test_knn_idx
     [,1]
[1,]    1
[2,]    1
[3,]    1
[4,]    1

$test_knn_dist
      [,1]
[1,] 0.175
[2,] 0.300
[3,] 0.300
[4,] 0.375

我没有收到任何 NA。输出是您实际期望的吗?

于 2018-05-31T19:07:02.033 回答