由于每个人都在摇摆...
与此不同的是正则表达式将文本与标点符号分开。我用\b\w+\b
import re
article='''Richard II (13671400) was King of England, a member of the House of Plantagenet and the last of its main-line kings. He ruled from 1377 until he was deposed in 1399. Richard was a son of Edward, the Black Prince, and was born during the reign of his grandfather, Edward III. Richard was tall, good-looking and intelligent. Although probably not insane, as earlier historians believed, he may have suffered from one or several personality disorders that may have become more apparent toward the end of his reign. Less of a warrior than either his father or grandfather, he sought to bring an end to the Hundred Years' War that Edward III had started. He was a firm believer in the royal prerogative, which led him to restrain the power of his nobility and rely on a private retinue for military protection instead. He also cultivated a courtly atmosphere where the king was an elevated figure, and art and culture were at the centre, in contrast to the fraternal, martial court of his grandfather. Richard's posthumous reputation has to a large extent been shaped by Shakespeare, whose play Richard II portrays Richard's misrule and Bolingbroke's deposition as responsible for the 15th-century Wars of the Roses. Most authorities agree that the way in which he carried his policies out was unacceptable to the political establishment, and this led to his downfall.'''
words = {}
for word in re.findall(r'\b\w+\b', article):
word=word.lower()
if word in words:
words[word]+=1
else:
words[word]=1
print [(k,v) for v, k in sorted(((v, k) for k, v in words.items()), reverse=True)]
打印出按频率排序的 (word, count) 元组列表:
[('the', 15), ('of', 11), ('was', 8), ('and', 8), ('to', 7), ('his', 7), ('he', 7),
('a', 7), ('richard', 6), ('in', 4), ('that', 3), ('s', 3), ('grandfather', 3),
('edward', 3), ('which', 2), ('reign', 2), ('or', 2), ('may', 2), ('led', 2),
('king', 2), ('iii', 2), ('ii', 2), ('have', 2), ('from', 2), ('for', 2), ('end', 2),
('as', 2), ('an', 2), ('years', 1), ('whose', 1), ('where', 1), ('were', 1), ('way', 1), ('wars', 1), ('warrior', 1), ('war', 1), ('until', 1), ('unacceptable', 1), ('toward', 1), ('this', 1), ('than', 1), ('tall', 1), ('suffered', 1), ('started', 1), ('sought', 1), ('son', 1), ('shaped', 1), ('shakespeare', 1), ('several', 1), ('ruled', 1), ('royal', 1), ('roses', 1), ('retinue', 1), ('restrain', 1), ('responsible', 1), ('reputation', 1), ('rely', 1), ('protection', 1), ('probably', 1), ('private', 1), ('prince', 1), ('prerogative', 1), ('power', 1), ('posthumous', 1), ('portrays', 1), ('political', 1), ('policies', 1), ('play', 1), ('plantagenet', 1), ('personality', 1), ('out', 1), ('one', 1), ('on', 1), ('not', 1), ('nobility', 1), ('most', 1), ('more', 1), ('misrule', 1), ('military', 1), ('member', 1), ('martial', 1), ('main', 1), ('looking', 1), ('line', 1), ('less', 1), ('last', 1), ('large', 1), ('kings', 1), ('its', 1), ('intelligent', 1), ('instead', 1), ('insane', 1), ('hundred', 1), ('house', 1), ('historians', 1), ('him', 1), ('has', 1), ('had', 1), ('good', 1), ('fraternal', 1), ('firm', 1), ('figure', 1), ('father', 1), ('extent', 1), ('establishment', 1), ('england', 1), ('elevated', 1), ('either', 1), ('earlier', 1), ('during', 1), ('downfall', 1), ('disorders', 1), ('deposition', 1), ('deposed', 1), ('culture', 1), ('cultivated', 1), ('courtly', 1), ('court', 1), ('contrast', 1), ('century', 1), ('centre', 1), ('carried', 1), ('by', 1), ('bring', 1), ('born', 1), ('bolingbroke', 1), ('black', 1), ('believer', 1), ('believed', 1), ('been', 1), ('become', 1), ('authorities', 1), ('atmosphere', 1), ('at', 1), ('art', 1), ('apparent', 1), ('although', 1), ('also', 1), ('agree', 1), ('15th', 1), ('1399', 1), ('1377', 1), ('13671400', 1)]