当我运行该应用程序时,我收到以下错误。
Error in FUN: invalid input 'at my monthly blog stats and we’re nearly on 4000 for April which is amazing – thank you Jx 😘😘' in 'utf8towcs'
由于 blogs.txt 文件中的情绪等,我试图将数据隐藏如下。
blogs<-iconv(blogs, "latin1", "ASCII", sub="")
news<-iconv(news, "latin1", "ASCII", sub="")
twitter<-iconv(twitter, "latin1", "ASCII", sub="")
并且还使用如下图标功能,
创建语料库并清理数据
corpus <- VCorpus(VectorSource(data.sample))
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
corpus <- tm_map(corpus, toSpace, "(f|ht)tp(s?)://(.*)[.][a-z]+")
tospace <- tm_map(corpus,
content_transformer(function(x)
iconv(x, to="UTF-8", sub="byte")),
mc.cores=1)
不过,我得到了这个问题。
请在这方面提供帮助。
会话信息:
=====================
R 版本 3.4.2 (2017-09-28)
平台:x86_64-w64-mingw32/x64(64位)
运行于:Windows 7 x64(内部版本 7601)Service Pack 1
矩阵产品:默认
语言环境:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
附加的基础包:
[1] stats graphics grDevices utils datasets methods base
其他附加包:
[1] stringr_1.2.0 shiny_1.0.5 slam_0.1-40 ggplot2_2.2.1 RWeka_0.4-35 tm_0.7-1 NLP_0.1-11
[8] 字符串i_1.1.5
通过命名空间加载(未附加):
[1] Rcpp_0.12.13 magrittr_1.5 RWekajars_3.9.1-4 munsell_0.4.3 colorspace_1.3-2
[6] xtable_1.8-2 R6_2.2.2 rlang_0.1.4 plyr_1.8.4 tools_3.4.2
[11]parallel_3.4.2 grid_3.4.2 gtable_0.2.0 htmltools_0.3.6 yaml_2.1.14
[16]lazyeval_0.2.1 digest_0.6.12 tibble_1.3.4 rJava_0.9-9 rsconnect_0.8.5
[21] mime_0.5 compiler_3.4.2 scales_0.5.0 jsonlite_1.5 httpuv_1.3.5