For example, I get I document that contains 2 sentences: I am a person. He also likes apples. Do we need to count the cooccurrence of "person" and "He" ?
每个文档用换行符分隔。共现的上下文窗口仅限于每个文档。
基于这里的实现。
换行符被视为指示新文档(上下文不会跨越换行符)。
所以,根据你准备句子的方式,你可能会得到不同的结果:
设置 1:('He', 'person')
同时发生
...
I am a person. He also likes apples.
...
设置 2:('He', 'person')
未同时发生
...
I am a person.
He also likes apples.
...