I have a list that contains a lot of tagged bigrams. Some of the bigrams are not tagged correctly so I want to remove them from the master list. One of the words of a bigrams keeps repeating frequently, so I can remove the bigram if it contains an xyz word. Psudo example is below:
master_list = ['this is', 'is a', 'a sample', 'sample word', 'sample text', 'this book', 'a car', 'literary text', 'new book', 'them about', 'on the' , 'in that', 'tagged corpus', 'on top', 'a car', 'an orange', 'the book', 'them what', 'then how']
unwanted_words = ['this', 'is', 'a', 'on', 'in', 'an', 'the', 'them']
new_list = [item for item in master_list if not [x for x in unwanted_words] in item]
I can remove the items separately, i.e. every time I create a list and remove the items which contain the word, say, 'on'. This is tedious and it will require hours of filtering and creating new lists for filtering each unwanted word. I thought that a loop will help. However, I get the following type error:
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
new_list = [item for item in master_list if not [x for x in unwanted_words] in item]
File "<pyshell#21>", line 1, in <listcomp>
new_list = [item for item in master_list if not [x for x in unwanted_words] in item]
TypeError: 'in <string>' requires string as left operand, not list
Your help is highly appreciated!