我从 Oracle DB 中提取一些数据来进行一些文本挖掘。我的数据是 UTF8 并且 vocab 无法处理它。
library(text2vec);
library(DBI);
Sys.setenv(TZ="+03:00");
drv=dbDriver("Oracle");
con=dbConnect(drv,username="user","pass",dbname="IP:port/servicename");
list=dbGetQuery(con,statement = "select * from test");
it_list = itoken(list$FNAME,
preprocessor = tolower,
tokenizer = word_tokenizer,
ids = list$ID,
progressbar = FALSE);
vocab = create_vocabulary(it_list, ngram = c(ngram_min = 1L, ngram_max =2L));
但词汇中只存在英文单词。
- 此链接中存在列表变量对象(可以加载
load()
) - 我用窗户
- 版本:
platform x86_64-w64-mingw32 arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 3.0
year 2016
month 05
day 03
svn rev 70573
language R
version.string Oracle Distribution of R version 3.3.0 (2016-05-03) 昵称 据说教育