So I have the following code:
def stripNonAlphaNum(text):
import re
return re.compile(r'\W+', re.UNICODE).split(text)
def readText(fileStub):
words = open(fileStub, 'r').read()
words = words.lower() # Make it lowercase
wordlist = sorted(stripNonAlphaNum(words))
wordfreq = []
for w in wordlist: # Increase count of one upon every iteration of the word.
wordfreq.append(wordlist.count(w))
return list(zip(wordlist, wordfreq))
It reads a file in, and then makes pairs of the word and frequency in which they occur. The issue I'm facing is that when I print the result, I don't get the proper pair counts.
If I have some input given, I might get output like this:
('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27),.. (27 times)
Which is NOT what I want it to do.
Rather I would like it to give 1 output of the word and just one number like so:
('and', 27), ('able', 5), ('bat', 6).. etc
So how do I fix this?