I need to count number of words in UTF-8 string. ie I need to write a python function which takes "एक बार,एक कौआ, बहुत प्यासा, था" as input and returns 7 ( number of words ).
I tried regular expression "\b" as shown below. But result are inconsistent.
wordCntExp=re.compile(ur'\b',re.UNICODE);
sen='एक बार,एक कौआ, बहुत प्यासा, था';
print len(wordCntExp.findall(sen.decode('utf-8'))) >> 1;
12
Any interpretation of the above answer or any other approaches to solve the above problem are appreciated.