0

我在将*.xls文件中的数据读取到 R 中时遇到问题。我正在尝试readxl::read_xls()从以下 URL 读取 Microsoft Excel 文件中的数据:https://www.misoenergy.org/Library/Repository/Market%20Reports/20171114_5min_exante_lmp。 .xls _ 我在 R 版本 3.4.1(单蜡烛)上,输出sessionInfo()粘贴在这篇文章的最底部。

该文件有 6 张包含数据的工作表。作为一个最小的例子,考虑阅读第二张纸,名为RT Ex-Ante 5 Minute LMPs(1). 下面的代码是我第一次尝试阅读此表:

library(readxl)
fpath <- '/Users/bmosovsky/Downloads/20171114_5min_exante_lmp.xls'
data <- read_excel( path=fpath, sheet=2, col_names=FALSE )

这允许 read_excel 猜测要读取的数据范围和列类型。我收到警告信息,

Warning message:
In read_fun(path = path, sheet = sheet, limits = limits, shim = shim,  :
  Expecting logical in B65535 / R65535C2: got 'IPL.CC.IPLEV01'

str(data)返回

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   65535 obs. of  6 variables:
 $ X__1: POSIXct, format: "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" ...
 $ X__2: logi  NA NA NA NA NA NA ...
 $ X__3: logi  NA NA NA NA NA NA ...
 $ X__4: logi  NA NA NA NA NA NA ...
 $ X__5: logi  NA NA NA NA NA NA ...
 $ X__6: logi  NA NA NA NA NA NA ...

认为这可能read_excel()只是错误地猜测了列类型,然后我尝试了:

data1 <- read_excel( path=fpath, sheet=2, col_names=FALSE, 
                    col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )

这消除了警告,因为列的类型正确,但我仍然获得NA除第一列之外的所有列的值。这次str(data1)回来了

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   65535 obs. of  6 variables:
 $ X__1: chr  "43052.2" "43052.2" "43052.2" "43052.2" ...
 $ X__2: chr  NA NA NA NA ...
 $ X__3: num  NA NA NA NA NA NA NA NA NA NA ...
 $ X__4: num  NA NA NA NA NA NA NA NA NA NA ...
 $ X__5: num  NA NA NA NA NA NA NA NA NA NA ...
 $ X__6: num  NA NA NA NA NA NA NA NA NA NA ...

最后,我尝试将 Excel 文件第二张表中的前 10 行数据(格式和全部)粘贴到新的 Excel 工作簿中,另存为test.xls,然后尝试以下操作:

fpath_test <- '/Users/bmosovsky/Downloads/test.xls'
data_test <- read_excel( path=fpath_test, sheet=1, col_names=FALSE,
                         col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )

现在str(data_test)返回正确的结果:

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   10 obs. of  6 variables:
 $ X__1: chr  "43052.2" "43052.2" "43052.2" "43052.2" ...
 $ X__2: chr  "CIN.MARKLND.3" "CIN.MIAMWAB.1" "CIN.MIAMWAB.2" "CIN.MIAMWAB.3" ...
 $ X__3: num  22.4 22.6 22.6 22.6 22.5 ...
 $ X__4: num  21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6
 $ X__5: num  0.8 1.02 1.02 1.02 0.92 0.93 1.29 1.29 1.29 0.06
 $ X__6: num  0.04 0.01 0.01 0.01 0.01 0.01 0.05 0.05 0.05 0.06

所以,我的问题是,下载的 Excel 文件有什么独特之处,它不允许将数据正确读入 R?作为自动数据收集过程的一部分,我正在尝试读取这些数据,因此无法对 Excel 文件进行任何类型的手动操作作为解决方法。谁能提供一些关于如何将.xls文件中所有表格中的数据获取到 R 中进行处理的见解?

这是来自的输出sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2      rvest_0.3.2       xml2_1.1.1        RPostgreSQL_0.6-2 DBI_0.7-12        lubridate_1.6.0   dplyr_0.7.2       readxl_1.0.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12.3   tidyr_0.6.3      assertthat_0.2.0 cellranger_1.1.0 R6_2.2.2         magrittr_1.5     httr_1.2.1       rlang_0.1.1      stringi_1.1.5   
[10] curl_2.8.1       stringr_1.2.0    glue_1.1.1       compiler_3.4.1   pkgconfig_2.0.1  bindr_0.1        tibble_1.3.3 
4

0 回答 0