r - R中的术语列表/术语向量后标记

Question

我有一个 .csv 文件，其中只有一列包含 1000 行。每行包含一个词（词袋模型）。现在我想找出每个单词是否是名词、动词、形容词等。我想要第二列（有 1000 行），每列包含属于列中单词的信息（名词或动词） 1.

我已经将 csv 导入到 R 中。但是我现在该怎么办？

[这里是一个例子。我有这些词，我想知道它是否是名词动词等] [ 在此处输入图像描述

score 1 · Accepted Answer

有多种选择，但您可以使用udpipe它。这

terms <- data.frame(term = c("unit", "determine", "generate", "digital", "mount", "control", "position", "input", "output", "user"),
                    stringsAsFactors = FALSE)

library(udpipe)

# check if model is already downloaded. 
if (file.exists("english-ud-2.0-170801.udpipe")) 
  ud_model <- udpipe_load_model(file = "english-ud-2.0-170801.udpipe") else {
    ud_model <- udpipe_download_model(language = "english")
    ud_model <- udpipe_load_model(ud_model$file_model)
  }


# no need for parsing as this data only contains single words.
t <- udpipe_annotate(ud_model, terms$term, parser = "none")
t <- as.data.frame(t)
terms$POSTAG <- t$upos

terms
        term POSTAG
1       unit   NOUN
2  determine   VERB
3   generate   VERB
4    digital    ADJ
5      mount   NOUN
6    control   NOUN
7   position   NOUN
8      input   NOUN
9     output   NOUN
10      user   NOUN

score 0 · Accepted Answer

您可以使用spacyrwhich 是 Python 包的 R Wrapper spaCy。

注意：您必须

设置 spacy https://spacy.io/usage/
安装英语语言模型https://spacy.io/usage/models

library(spacyr)

spacy_initialize(python_executable = '/path/to/python')

然后根据您的条件：

Terms <- data.frame(Term = c("unit",
                    "determine",
                    "generate",
                    "digital",
                    "mount",
                    "control",
                    "position",
                    "input",
                    "output",
                    "user"), stringsAsFactors = FALSE)

使用该功能spacy_parse()标记您的术语并将它们添加到您的数据框中：

Terms$POS_TAG <- spacy_parse(Terms$Term)$pos

结果是：

        Term POS_TAG
1       unit    NOUN
2  determine    VERB
3   generate    VERB
4    digital     ADJ
5      mount    VERB
6    control    NOUN
7   position    NOUN
8      input    NOUN
9     output    NOUN
10      user    NOUN

r - R中的术语列表/术语向量后标记

2 回答 2

Related

Reference