I have a table called 'ticket_diary_comment'
with a column called 'comment_text'
. This column is populated with text data. I would like to get the frequency of all the words occurring in this entire column. Ex:
Comment_Text
I am a good guy
I am a bad guy
I am not a guy
What I want:
Word Frequency
I 3
good 1
bad 1
not 1
guy 3
Notice that I have also removed the stop words in the output. I know calculating the frequency of a particular word is not difficult but I am looking for something that counts all the words appearing in a column removing the stop words.
I would appreciate any kind of help on this issue. I would also like to mention that I have to apply this query on a big-ish dataset (about 1 TB), so performance is a concern.