我有一个句子列表,例如:
Sentence 1.
And Sentence 2.
Or Sentence 3.
New Sentence 4.
New Sentence 5.
And Sentence 6.
我正在尝试根据“连词标准”对这些句子进行分组,这样如果下一个句子以连词开头(目前只是“和”或“或”),那么我想对它们进行分组:
Group 1:
Sentence 1.
And Sentence 2.
Or Sentence 3.
Group 2:
New Sentence 4.
Group 3:
New Sentence 5.
And Sentence 6.
我写了下面的代码,它以某种方式检测到连续的句子,但不是全部。
我怎样才能递归地编码呢?我尝试对其进行迭代编码,但是在某些情况下它不起作用,我无法弄清楚如何在递归中对其进行编码。
tokens = ["Sentence 1.","And Sentence 2.","Or Sentence 3.","New Sentence 4.","New Sentence 5.","And Sentence 6."]
already_selected = []
attachlist = {}
for i in tokens:
attachlist[i] = []
for i in range(len(tokens)):
if i in already_selected:
pass
else:
for j in range(i+1, len(tokens)):
if j not in already_selected:
first_word = nltk.tokenize.word_tokenize(tokens[j].lower())[0]
if first_word in conjucture_list:
attachlist[tokens[i]].append(tokens[j])
already_selected.append(j)
else:
break