我曾尝试使用 snowballc 词干分析器进行词干提取,但它会为相同的查询产生不同的输出
wordStem("waiting",language = "porter")
## [1] wait
上面的词是正确的,但是每当我给出一组标记作为输入时
c("htc", "makes", "bad", "cheap", "phones", "dont", "buy", "cheap",
"phones", "battery", "jock", "taiwanese", "buying", "htc", "desire",
"phone", "htc", "mobile", "specifications", "battery", "pick",
"low", "light", "camera", "experience", "htc", "e8", "desire",
"10", "pro", "phone", "performance", "excellent", "camera", "nice",
"model", "cam", "battery", "realy", "hand", "set", "phone", "coming",
"phone", "price", "range", "average", "performer", "worst", "battery",
"features", "goooood", "htc", "real", "hero", "amazing", "x9",
"features", "battery", "poor", "camera", "battery", "life", "x9",
"e9", "worry", "phone", "processor", "happy", "battery", "life",
"drain", "faster", "buy", "product", "heating", "issue", "concern",
"front", "facing", "camera", "awful", "htc", "phones", "heats",
"quickly", "pity", "phone", "beautiful", "potential", "stylish",
"htc", "fan", "wise", "ofcourse", "htc", "overpriced", "compared",
"xiaomi", "redmi", "note", "3", "design", "fingerprint", "reader",
"capacitive", "buttons", "screen", "iam", "100", "satisfied",
"phone", "brought", "2014", "smoothly", "touch", "battery", "backup",
"buying", "phone", "total", "waste", "money", "nice", "phone",
"price", "range", "device", "front", "facing", "camera", "awful",
"htc", "phones", "heats", "quickly", "pity", "phone", "beautiful",
"potential", "stylish", "htc", "fan", "wise", "ofcourse", "htc",
"overpriced", "compared", "xiaomi", "redmi", "note", "3", "design",
"fingerprint", "reader", "capacitive", "buttons", "screen", "iam",
"100", "satisfied", "phone", "brought", "2014", "htc", "desire",
"eye", "happy", "phone", "phone", "appearance", "nice", "mobile",
"plz", "update", "price", "phone", "meant", "performance", "camera",
"htc", "nice", "mobiles", "waiting", "mobile", "htc", "10", "pro",
"nice", "camera", "beautiful", "design", "fingerprint", "sensor",
"overheating", "issue", "typical", "htc", "crappy", "specs", "poo")
它输出没有变化的单词
wordStem(htcdtokensstop,language = "porter")
## [1] "htc" "makes" "bad" "cheap" "phones" "dont" "buy" "cheap"
## [9] "phones" "battery" "jock" "taiwanese" "buying" "htc" "desire" "phone"
## [17] "htc" "mobile" "specifications" "battery" "pick" "low" "light" "camera"
## [25] "experience" "htc" "e8" "desire" "10" "pro" "phone" "performance"
## [33] "excellent" "camera" "nice" "model" "cam" "battery" "realy" "hand"
## [41] "set" "phone" "coming" "phone" "price" "range" "average" "performer"
## [49] "worst" "battery" "features" "goooood" "htc" "real" "hero" "amazing"
## [57] "x9" "features" "battery" "poor" "camera" "battery" "life" "x9"
## [65] "e9" "worry" "phone" "processor" "happy" "battery" "life" "drain"
## [73] "faster" "buy" "product" "heating" "issue" "concern" "front" "facing"
## [81] "camera" "awful" "htc" "phones" "heats" "quickly" "pity" "phone"
## [89] "beautiful" "potential" "stylish" "htc" "fan" "wise" "ofcourse" "htc"
## [97] "overpriced" "compared" "xiaomi" "redmi" "note" "3" "design" "fingerprint"
## [105] "reader" "capacitive" "buttons" "screen" "iam" "100" "satisfied" "phone"
## [113] "brought" "2014" "smoothly" "touch" "battery" "backup" "buying" "phone"
## [121] "total" "waste" "money" "nice" "phone" "price" "range" "device"
## [129] "front" "facing" "camera" "awful" "htc" "phones" "heats" "quickly"
## [137] "pity" "phone" "beautiful" "potential" "stylish" "htc" "fan" "wise"
## [145] "ofcourse" "htc" "overpriced" "compared" "xiaomi" "redmi" "note" "3"
## [153] "design" "fingerprint" "reader" "capacitive" "buttons" "screen" "iam" "100"
## [161] "satisfied" "phone" "brought" "2014" "htc" "desire" "eye" "happy"
## [169] "phone" "phone" "appearance" "nice" "mobile" "plz" "update" "price"
## [177] "phone" "meant" "performance" "camera" "htc" "nice" "mobiles" "waiting"
## [185] "mobile" "htc" "10" "pro" "nice" "camera" "beautiful" "design"
## [193] "fingerprint" "sensor" "overheating" "issue" "typical" "htc" "crappy" "specs"
## [201] "poo"
如果有一种方法可以处理(进行词干提取)标记中的所有单词,那将会很有帮助。