我认为您正在寻找的是一个简单的字典结构。这不仅可以让您跟踪要查找的单词,还可以跟踪它们的数量。
字典将事物存储为键/值对。因此,例如,您可以拥有关键字“alice”(您要查找的单词,并将其值设置为您找到该关键字的次数。
检查字典中是否有内容的最简单方法是通过 Python 的in
关键字。IE
if 'pie' in words_in_my_dict: do something
有了这些信息,设置字数计数器就很容易了!
def get_word_counts(words_to_count, filename):
words = filename.split(' ')
for word in words:
if word in words_to_count:
words_to_count[word] += 1
return words_to_count
if __name__ == '__main__':
fake_file_contents = (
"Alice's Adventures in Wonderland (commonly shortened to "
"Alice in Wonderland) is an 1865 novel written by English"
" author Charles Lutwidge Dodgson under the pseudonym Lewis"
" Carroll.[1] It tells of a girl named Alice who falls "
"down a rabbit hole into a fantasy world populated by peculiar,"
" anthropomorphic creatures. The tale plays with logic, giving "
"the story lasting popularity with adults as well as children."
"[2] It is considered to be one of the best examples of the literary "
"nonsense genre,[2][3] and its narrative course and structure, "
"characters and imagery have been enormously influential[3] in "
"both popular culture and literature, especially in the fantasy genre."
)
words_to_count = {
'alice' : 0,
'and' : 0,
'the' : 0
}
print get_word_counts(words_to_count, fake_file_contents)
这给出了输出:
{'and': 4, 'the': 5, 'alice': 0}
由于dictionary
存储了我们想要计算的单词和它们出现的时间。整个算法只是检查每个单词是否在 中dict
,如果结果是我们,我们添加1
到该单词的值。
在这里阅读字典。
编辑:
如果你想计算所有的单词,然后找到一个特定的集合,字典对于这项任务来说仍然很棒(而且速度很快!)。
我们需要做的唯一更改是首先检查字典是否key
存在,如果不存在,则将其添加到字典中。
例子
def get_all_word_counts(filename):
words = filename.split(' ')
word_counts = {}
for word in words:
if word not in word_counts: #If not already there
word_counts[word] = 0 # add it in.
word_counts[word] += 1 #Increment the count accordingly
return word_counts
这给出了输出:
and : 4
shortened : 1
named : 1
popularity : 1
peculiar, : 1
be : 1
populated : 1
is : 2
(commonly : 1
nonsense : 1
an : 1
down : 1
fantasy : 2
as : 2
examples : 1
have : 1
in : 4
girl : 1
tells : 1
best : 1
adults : 1
one : 1
literary : 1
story : 1
plays : 1
falls : 1
author : 1
giving : 1
enormously : 1
been : 1
its : 1
The : 1
to : 2
written : 1
under : 1
genre,[2][3] : 1
literature, : 1
into : 1
pseudonym : 1
children.[2] : 1
imagery : 1
who : 1
influential[3] : 1
characters : 1
Alice's : 1
Dodgson : 1
Adventures : 1
Alice : 2
popular : 1
structure, : 1
1865 : 1
rabbit : 1
English : 1
Lutwidge : 1
hole : 1
Carroll.[1] : 1
with : 2
by : 2
especially : 1
a : 3
both : 1
novel : 1
anthropomorphic : 1
creatures. : 1
world : 1
course : 1
considered : 1
Lewis : 1
Charles : 1
well : 1
It : 2
tale : 1
narrative : 1
Wonderland) : 1
culture : 1
of : 3
Wonderland : 1
the : 5
genre. : 1
logic, : 1
lasting : 1
split(' ')
注意:正如你所看到的,当我们创建文件时有几个“失败” 。具体来说,有些单词附有左括号或右括号。您必须在文件处理中考虑到这一点。但是,我让您自己弄清楚!