0

everyone! I am so very sorry for this question, but I don't have any experience in regex and I would like to know if something is truely possible to do.

I am working on a corpus of news stories taken from the BBC News. However, some news items are repeated in my corpus and I would like to know if something can be done to highlight these duplicates without sorting out my data. Thank you so much and I do apologise again for this maybe naive question.

4

1 回答 1

1

通常我会通过删除重复项进行排序并将结果保存在不同的文件中(保持原始文件不变)。然后我比较这两个文件(总指挥官,考试差异,...)。

于 2018-06-05T12:55:19.363 回答