1

我用数据框的一个子集成功地做到了这一点,但我似乎无法让它与我的另一个子集一起工作。有大约 4000 个订单的信息,范围为 0-8 个月,情绪为 0-5。

目标是融合 id 为“order”和“month.of.service”的数据,并汇总该月的平均情绪。数据框如下所示:

order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2

我希望它看起来像这样:

123   |   0   |    3.5
123   |   1   |    3
124   |   0   |    2

这是我使用的实际代码:

sentiment.md <- melt(sentiment, id = c('Related.order', 'Lifespan'))
sentiment.dc <- dcast(sentiment.md, Related.order + Lifespan ~ value, sum)

> head(sentiment.md)
  Related.order Lifespan  variable value
1         12771        0 Sentiment     5
2         11188        1 Sentiment     3
3         12236        3 Sentiment     5
4         12925        0 Sentiment     5
5         12151        3 Sentiment     5
6         12338        0 Sentiment     5

> head(sentiment.dc)
  Related.order Lifespan   0   1   2   3   4   5
1          4976        0 NaN NaN NaN   3 NaN NaN
2          4976        1 NaN NaN NaN   3 NaN NaN
3          4976        2 NaN NaN NaN NaN   4 NaN
4          4976        3 NaN NaN NaN NaN   4 NaN
5          4976        4 NaN NaN NaN NaN   4 NaN
6          4976        5 NaN NaN NaN NaN   4 NaN

为了进一步演示我希望它看起来像什么,这里是完全相同的东西,使用我想要的格式的数据框中的唯一其他列,交互:

interactions.md <- melt(interactions, id = c('Related.order', 'Lifespan'))
interactions.dc <- dcast(interactions.md, Related.order + Lifespan ~ value, sum)

> head(interactions.md)
  Related.order Lifespan variable value
1         12771        0    Event     1
2         11188        1    Event     1
3         12236        3    Event     1
4         12925        0    Event     1
5         12151        3    Event     1
6         12338        0    Event     1
> head(interactions.dc)
  Related.order Lifespan 1
1          4976        0 6
2          4976        1 3
3          4976        2 3
4          4976        3 1
5          4976        4 2
6          4976        5 2

我想也许我使用了错误的结构或其他东西,但无法识别任何东西。作为参考,这里是 R-studio 的截图:

在此处输入图像描述 在此先感谢您的帮助。

4

2 回答 2

4

也许你想做一些比你想做的更多的聚合/折叠dcast

library(data.table);
setDT(df)[, .(sentiment = mean(sentiment)), by = .(order, month)]
#   order month  V1
#1:   123     0 3.5
#2:   123     1 3.0
#3:   124     0 2.0

如果您确实想这样做,dcast可以尝试:

dcast(df, order + month ~ ., mean, value.var = "sentiment")

或与dplyr

df %>% group_by(order, month) %>% summarise(sentiment = mean(sentiment))

这些只是 R 中聚合的众多示例中的一部分。


数据:

df <- structure(list(order = c(123L, 123L, 123L, 124L), month = c(0L, 
0L, 1L, 0L), sentiment = c(3L, 4L, 3L, 2L)), .Names = c("order", 
"month", "sentiment"), row.names = c(NA, -4L), class = "data.frame")
于 2018-03-28T18:08:13.433 回答
2

对于基数 R,使用aggregate.

aggregate(sentiment ~ month + order, sentiment, mean, na.rm = TRUE)[c(2, 1, 3)]
#  order month sentiment
#1   123     0       3.5
#2   123     1       3.0
#3   124     0       2.0

数据。

sentiment <- read.table(text = "
order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2
", header = TRUE, sep = "|")
于 2018-03-28T18:12:49.047 回答