0

我必须找到符号“a..,z”、“A,..,Z”、“空格”、“。” 和“,”在一些数据中。

我已经尝试过代码:

fh = codecs.open("mydata.txt", encoding = "utf-8")
text = fh.read()
fh1 = unicode(text)
dic_freq_signs = dict(Counter(fh1.split()))
All_freq_signs = dic_freq_signs.items()
List_signs = dic_freq_signs.keys()
List_freq_signs = dic_freq_signs.values()

但它让我得到所有迹象而不是我正在寻找的那些?任何人都可以帮忙吗?

(而且它必须是 unicode)

4

2 回答 2

0

检查字典迭代..

All_freq_signs = [ item for item in dic_freq_signs.items() if item.something == "somevalue"]
def criteria(value):
    return value%2 == 0
All_freq_signs = [ item for item in dic_freq_signs.items() if criteria(item)]
于 2015-01-14T08:46:20.317 回答
0

Make sure you import string module, with it you can get character ranges a to z and A to Z easily

import string

A Counter(any_string) gives the count of each character in the string. By using split() the counter would return the counts of each word in the string, contradicting with your requirement. So I have assumed that you need character counts.

dic_all_chars = dict(Counter(fh1))    # this gives counts of all characters in the string
signs = string.lowercase + string.uppercase + ' .,'    # these are the characters you want to check

# using dict comprehension and checking if the key is in the characters you want
dic_freq_signs = {key: value for key, value in dic_all_chars.items() 
                             if key in signs}

dic_freq_signs would only have the signs that you want to count as keys and their counts as values.

于 2015-01-14T14:27:56.990 回答