1

我的数据目前按以下方法组织(实际数据见下表第一个)。我只显示整体数据的一部分,因为完整图像非常大(超过 100 行)。

Row   September        October         November        December      January        February       March        April       May            June          July       
1     Chino Hills      Huntington Bea~ Fountain Valley Anaheim       Fountain Vall~ Arcadia        Anaheim      Newport Be~ Santa Ana      NA            NA         
2     Irvine           Cerritos        Long Beach      Chino Hills   Cerritos       Anaheim        NA           Banning     Newport Beach  Anaheim       NA         
3     Glendale         NA              West Covina     Monterey Park Encino         NA             Monterey Pa~ NA          Los Angeles    Cerritos      Beverly Hi~
4     Norco            Fountain Valley NA              Monterey Park NA             Long Beach     NA           Santa Ana   Huntington Be~ Fountain Val~ NA         
5     Los Angeles      Inglewood       West Covina     Glendale      NA             Glendale       NA           Granada Hi~ Chino          West Covina   Tarzana

我想改变它的组织方式,使其显示以下内容。我想强调的是,它会显示所有城市,而不仅仅是我选择列出的城市。这是一个不完整的图表,但它传达了这个想法:

+-------------+------------------+--------+----------+
| Chino Hills | Huntington Beach | Irvine | Glendale |
+-------------+------------------+--------+----------+
| Row 1       | Row 1            | Row 2  | Row 3    |
| Row 2       |                  |        | Row 5    |
|             |                  |        | Row 5    |
+-------------+------------------+--------+----------+

我试过tidyr::separate_rows(dfl, col)了,但这只有在城市在一个牢房中时才有效;但是,它们位于多行的多个单元格中。这就是我尝试时发生的情况tidyr::separate_rows(dfl, col)

Row   September        October         November        December      January        February       March        April       May            June          July       
   <chr> <chr>            <chr>           <chr>           <chr>         <chr>          <chr>          <chr>        <chr>       <chr>          <chr>         <chr>      
 1 1     Chino Hills      Huntington Bea~ Fountain Valley Anaheim       Fountain Vall~ Arcadia        Anaheim      Newport Be~ Santa Ana      NA            NA         
 2 2     Irvine           Cerritos        Long Beach      Chino Hills   Cerritos       Anaheim        NA           Banning     Newport Beach  Anaheim       NA         
 3 3     Glendale         NA              West Covina     Monterey Park Encino         NA             Monterey Pa~ NA          Los Angeles    Cerritos      Beverly Hi~
 4 4     Norco            Fountain Valley NA              Monterey Park NA             Long Beach     NA           Santa Ana   Huntington Be~ Fountain Val~ NA         
 5 5     Los Angeles      Inglewood       West Covina     Glendale      NA             Glendale       NA           Granada Hi~ Chino          West Covina   Tarzana

如您所见,它唯一做的就是添加另一行我不需要的数字。

总之,我需要程序 R 来查找所有城市并告诉我它们在哪一行。如果该城市不止一次在该行中,则该行可能会出现不止一次。它将组织多个列,而不仅仅是 tidyr 中使用的标准一列。列数将取决于不同城市的数量。

4

1 回答 1

1

我们可以获取长格式的数据,只为每个值保留唯一的值,Rowvalue获取宽格式的数据。假设df是数据框名称。

library(dplyr)
library(tidyr)

df %>%
   pivot_longer(cols = -Row, values_drop_na = TRUE) %>%
   distinct(Row, value) %>%
   group_by(value) %>%
   mutate(row = row_number()) %>%
   pivot_wider(names_from = value, values_from = Row)
于 2020-07-10T02:54:53.950 回答