linux - tm 和 Snowball 包命令在 Linux 中运行缓慢

Question

我在 R 中使用 tm 和 Snowball 包进行文本挖掘。我最初在装有 Windows 7 和 8 GB 内存的笔记本电脑上运行它。后来我在具有 64 GB 内存的 Linux (Ubuntu) 机器上尝试了相同的操作。这两台机器都是 64 位的，并且也使用 64 位版本的 R。但是，Windows 有 R 3.0.0，而 Linux 有 R 2.14

与 Windows 相比，Linux 中的某些命令非常慢。

Corpus 指挥部

在窗户上

    d <- data.frame(chatTranscripts$chatConcat)
    ds <- DataframeSource(d)
    t1 <- Sys.time()
    dsc<-Corpus(ds)
    print(Sys.time() - t1)
    Time difference of 46.86169 secs

这在 Windows 机器上只用了 47 秒

在 Linux 上

    t1 <- Sys.time()
    dsc<-Corpus(ds)
    print(Sys.time() - t1)
    Time difference of 3.674376 mins

这在 Linux 机器上花了大约220 秒

雪球词干

在窗户上

    t1 <- Sys.time()
    dsc <- tm_map(dsc,stemDocument)
    print(Sys.time() - t1)
    Time difference of 12.05321 secs

这在 Windows 机器上只用了12 秒

在 Linux 上

    t1 <- Sys.time()
    dsc <- tm_map(dsc,stemDocument)
    print(Sys.time() - t1)
    Time difference of 4.832964 mins

这在 Linux 机器上花了大约290 秒

有没有办法在 Linux 机器上加速这些命令？R版本会产生如此大的不同吗？谢谢你。

拉维

score 0 · Accepted Answer

Corpus()onVectorSource()似乎比Corpus()on更快DataframeSource()。

你可以试试

d <- chatTranscripts$chatConcat
ds <- VectorSource(d)
Corpus(ds)

linux - tm 和 Snowball 包命令在 Linux 中运行缓慢

1 回答 1

Related

Reference