0

我正在寻找一种“优雅”的方法来基本上按一个列变量的级别拆分数据框,然后创建一个新的输出数据框,重新整形以现在删除因子变量并为因子变量的级别添加新列。我可以使用诸如 split() 方法之类的函数来做到这一点,但这对我来说似乎是一种混乱的方式。我一直在尝试使用 plyr 包中的 melt() 和 cast() 函数来做到这一点,但没有成功获得我需要的确切输出。

这是我的数据的样子:

> jumbo.df = read.csv(...)
> head(jumbo.df)
         PricingDate  Name     Rate 
    186  2012-03-05   Type A   2.875  
    187  2012-03-05   Type B   3.250  
    188  2012-03-05   Type C   3.750  
    189  2012-03-05   Type D   3.750  
    190  2012-03-05   Type E   4.500  
    191  2012-03-06   Type A   2.875

我想做的是按变量name拆分,删除NameRate,然后输出Type AType BType CType DType E的列,并带有相应的 Rate 系列,其中 Date 为 ID:

> head(output.df)
         PricingDate  Type A   Type B    Type C    Type D    Type E 
         2012-03-05    2.875    3.250     3.750     3.750     4.500  
         2012-03-06    2.875    ...

谢谢!

4

2 回答 2

4

不确定我是否正确,但您是否只是想将数据重塑为宽格式?如果是这样,您必须使用(!) 包的meltcast功能。基本相同。由于您的数据已经是熔融格式,即长格式,单行器可以满足您的要求:reshapereshape2

df <- read.table(textConnection("PricingDate  Name     Rate 
                                 2012-03-05   TypeA   2.875  
                                 2012-03-05   TypeB   3.250  
                                 2012-03-05   TypeC   3.750  
                                 2012-03-05   TypeD   3.750  
                                 2012-03-05   TypeE   4.500  
                                 2012-03-06   TypeA   2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
  PricingDate TypeA TypeB TypeC TypeD TypeE
1  2012-03-05 2.875  3.25  3.75  3.75   4.5
2  2012-03-06 2.875    NA    NA    NA    NA
于 2012-06-01T15:27:26.137 回答
1
library(plyr)
library(reshape2)

    data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05", 
    "2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06", 
    "2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C", 
    "Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
    ), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75, 
    7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186", 
    "187", "188", "189", "190", "191", "192", "193", "194", "195"
    ))


    > data
        PricingDate   Name  Rate
    186  2012-03-05 Type A 2.875
    187  2012-03-05 Type B 3.250
    188  2012-03-05 Type C 3.750
    189  2012-03-05 Type D 3.750
    190  2012-03-05 Type E 4.500
    191  2012-03-06 Type A 4.875
    192  2012-03-06 Type B 5.250
    193  2012-03-06 Type C 6.750
    194  2012-03-06 Type D 7.750
    195  2012-03-06 Type E 8.500


    ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))



    PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
    1  2012-03-05       2.875        3.25        3.75        3.75
    2  2012-03-06       4.875        5.25        6.75        7.75
      Rate.Type E
    1         4.5
    2         8.5
于 2012-06-01T15:34:27.943 回答