我使用 R tm 包创建了一个术语文档矩阵,并通过将其转换为数据框将其导出为 csv。
术语文档矩阵的样本部分:
1 10 12 14 15 16 17
century 0 4 0 0 1 5 3
pete 0 2 0 6 1 0 0
additive 2 0 0 0 0 0 0
administration 1 5 3 0 3 0 0
administration 1 0 0 0 0 0 5
administrator 0 0 0 0 0 0 0
aeronautical 3 0 0 45 5 0 0
agency 0 0 5 0 0 0 0
amateur 0 0 6 0 0 0 0
anchor 5 0 1 0 0 6 0
basic 0 0 0 0 0 0 0
charles 0 0 6 0 0 0 0
commercial 0 6 0 0 0 4 0
commercial 0 0 0 0 0 2 0
commission 0 0 3 7 2 0 0
committee 0 4 0 0 1 5 3
compelling 0 2 7 6 1 0 0
construction 2 0 0 0 0 0 0
controlled 1 5 6 0 3 0 0
cooperating 1 0 0 0 0 0 5
cost 0 0 0 0 0 0 0
crewmember 3 0 0 45 0 0 0
depressed 0 0 0 0 0 0 0
developer 0 0 8 0 0 0 0
development 5 0 0 0 0 0 0
development 0 0 0 0 0 0 0
direct 0 0 0 0 0 0 0
如何将其转换为下表中包含标题和仅包含其中的术语的表格,以便在表格中进行进一步分析?
Title term freq
1 additive 2
1 administration 1
1 administration 1
1 aeronautical 3
1 anchor 5
1 construction 2
1 controlled 1
1 cooperating 1
1 crewmember 3
1 development 5
10 century 4
10 pete 2
10 administration 5
10 commercial 6
10 committee 4
10 compelling 2
10 controlled 5
12 administration 3
12 agency 5
12 amateur 6
12 anchor 1
12 charles 6
12 commission 3
12 compelling 7
12 controlled 6
12 developer 8
. ... ..
. ... ..
. ... ..
. ... ..
. ... ..