我在将*.xls
文件中的数据读取到 R 中时遇到问题。我正在尝试readxl::read_xls()
从以下 URL 读取 Microsoft Excel 文件中的数据:https://www.misoenergy.org/Library/Repository/Market%20Reports/20171114_5min_exante_lmp。 .xls _ 我在 R 版本 3.4.1(单蜡烛)上,输出sessionInfo()
粘贴在这篇文章的最底部。
该文件有 6 张包含数据的工作表。作为一个最小的例子,考虑阅读第二张纸,名为RT Ex-Ante 5 Minute LMPs(1)
. 下面的代码是我第一次尝试阅读此表:
library(readxl)
fpath <- '/Users/bmosovsky/Downloads/20171114_5min_exante_lmp.xls'
data <- read_excel( path=fpath, sheet=2, col_names=FALSE )
这允许 read_excel 猜测要读取的数据范围和列类型。我收到警告信息,
Warning message:
In read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting logical in B65535 / R65535C2: got 'IPL.CC.IPLEV01'
并str(data)
返回
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: POSIXct, format: "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" ...
$ X__2: logi NA NA NA NA NA NA ...
$ X__3: logi NA NA NA NA NA NA ...
$ X__4: logi NA NA NA NA NA NA ...
$ X__5: logi NA NA NA NA NA NA ...
$ X__6: logi NA NA NA NA NA NA ...
认为这可能read_excel()
只是错误地猜测了列类型,然后我尝试了:
data1 <- read_excel( path=fpath, sheet=2, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
这消除了警告,因为列的类型正确,但我仍然获得NA
除第一列之外的所有列的值。这次str(data1)
回来了
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr NA NA NA NA ...
$ X__3: num NA NA NA NA NA NA NA NA NA NA ...
$ X__4: num NA NA NA NA NA NA NA NA NA NA ...
$ X__5: num NA NA NA NA NA NA NA NA NA NA ...
$ X__6: num NA NA NA NA NA NA NA NA NA NA ...
最后,我尝试将 Excel 文件第二张表中的前 10 行数据(格式和全部)粘贴到新的 Excel 工作簿中,另存为test.xls
,然后尝试以下操作:
fpath_test <- '/Users/bmosovsky/Downloads/test.xls'
data_test <- read_excel( path=fpath_test, sheet=1, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
现在str(data_test)
返回正确的结果:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr "CIN.MARKLND.3" "CIN.MIAMWAB.1" "CIN.MIAMWAB.2" "CIN.MIAMWAB.3" ...
$ X__3: num 22.4 22.6 22.6 22.6 22.5 ...
$ X__4: num 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6
$ X__5: num 0.8 1.02 1.02 1.02 0.92 0.93 1.29 1.29 1.29 0.06
$ X__6: num 0.04 0.01 0.01 0.01 0.01 0.01 0.05 0.05 0.05 0.06
所以,我的问题是,下载的 Excel 文件有什么独特之处,它不允许将数据正确读入 R?作为自动数据收集过程的一部分,我正在尝试读取这些数据,因此无法对 Excel 文件进行任何类型的手动操作作为解决方法。谁能提供一些关于如何将.xls
文件中所有表格中的数据获取到 R 中进行处理的见解?
这是来自的输出sessionInfo()
:
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 rvest_0.3.2 xml2_1.1.1 RPostgreSQL_0.6-2 DBI_0.7-12 lubridate_1.6.0 dplyr_0.7.2 readxl_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12.3 tidyr_0.6.3 assertthat_0.2.0 cellranger_1.1.0 R6_2.2.2 magrittr_1.5 httr_1.2.1 rlang_0.1.1 stringi_1.1.5
[10] curl_2.8.1 stringr_1.2.0 glue_1.1.1 compiler_3.4.1 pkgconfig_2.0.1 bindr_0.1 tibble_1.3.3