我正在尝试使用以下数据集在 R 中创建术语文档矩阵
EmailSubject
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
提前体验手机。谢谢你钦奈限期优惠
我使用过 qdap 和 freq_terms。以下是预期的输出
freq_terms(DF)
Expected Output Frequency
Buy 4
Get 5
a 7
thank 12
Stunning 6
The 7
New 10
Valentines 4
phone 7
以下特殊字符不断出现,使数据不合适。
valentinea€™s, a€™s instead of valentines, as. I have tried the same with tm package also.
我已经使用 gsub 来替换这些字符,但它不是很有效。有人可以建议一种方法吗?