everyone! I am so very sorry for this question, but I don't have any experience in regex and I would like to know if something is truely possible to do.
I am working on a corpus of news stories taken from the BBC News. However, some news items are repeated in my corpus and I would like to know if something can be done to highlight these duplicates without sorting out my data. Thank you so much and I do apologise again for this maybe naive question.