python - Python - 如何找出用户说出“the”或“The”这个词的次数

Question

sentence2 = raw_input("Enter the sentence on the StringLab3 WS: ")

sentence.split(sentence2)
for word in default_sentence:
    if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):
        words += 1

print "The amounf of times 'the' or 'The' appear is a total of", words, "times."

这就是我现在所拥有的，该句子的输出当前为 961：

这是国家奉献的日子。我敢肯定，在这一天，我的美国同胞期待我就任总统时，我会以我们人民当前形势所推动的坦率和决定向他们讲话。现在正是坦率和大胆地说出真相、全部真相的最佳时机。我们也不必畏惧今天我们国家诚实面对的条件。这个伟大的国家将会长存，正如它所长存的那样，将会复兴和繁荣。因此，首先，让我坚持我的坚定信念，即我们唯一需要害怕的就是恐惧本身，一种无名的、不合理的、不合理的恐惧，它使化退为进所需的努力瘫痪。在我们国家生活的每一个黑暗时刻，坦率和充满活力的领导得到了人民自己的理解和支持，这对胜利至关重要。我相信，在这些关键的日子里，你们会再次向领导层提供这种支持。

我们应该让用户输入这个。有什么建议吗？

score 4 · Accepted Answer

最简单的实现，可能也是最快的，是：

sentence.lower().split().count('the')

取一段，把它变成小写，把它分成单词，然后数一下这些单词有多少'the'。几乎是问题描述的直接翻译。

您尝试的第一个问题是您将用户输入读取到名为的变量sentence2中，然后将其用作分隔符来拆分其他一些名为sentence的变量，丢弃结果，然后循环另一个名为的变量default_sentence。那是行不通的。Python 不会仅仅因为变量名有点相似就猜到你的意思。你必须写下前三行：

第二个问题是你的or表达并不意味着你认为它的意思。这已经在许多其他问题中得到了解释；您可以从我的 if else 语句发生了什么开始，如果这不能解释它，请从那里查看相关链接和重复项。

如果您解决了这两个问题，您的代码实际上可以工作：

sentence = raw_input("Enter the sentence on the StringLab3 WS: ")
default_sentence = sentence.split()
words = 0
for word in default_sentence:
    if word in ((chr(84)+chr(104)+chr(101)), (chr(116)+chr(104)+chr(101))):
        words += 1

print "The amounf of times 'the' or 'The' appear is a total of", words, "times."

我不知道为什么其他人都以效率的名义过度复杂化，通过用count明确sum的理解或使用正则表达式或使用在拆分之后而不是之前或之前map调用lower来替换...但他们实际上是在做事情更慢也更难阅读。像这样的微优化通常是这种情况……例如：

In [2829]: %timeit paragraph.lower().split().count('the')
100000 loops, best of 3: 14.2 µs per loop
In [2830]: %timeit sum([1 for word in paragraph.lower().split() if word == 'the'])
100000 loops, best of 3: 18 µs per loop
In [2831]: %timeit sum(1 for word in paragraph.lower().split() if word == 'the')
100000 loops, best of 3: 17.8 µs per loop
In [2832]: %timeit re.findall(r'\bthe\b', paragraph, re.I)
10000 loops, best of 3: 38.3 µs per loop
In [2834]: %timeit list(map(lambda word: word.lower(), paragraph.split())).count("the")
10000 loops, best of 3: 49.6 µs per loop

score 2 · Accepted Answer

我推荐这个：

map(lambda word: word.lower(), paragraph.split()).count("the")

输出：

>>> paragraph = "This is a day of national consecration. And I am certain that on this day my fellow Americans expect that on my induction into the Presidency, I will address them with a can
dor and a decision which the present situation of our people impels. This is preeminently the time to speak the truth, the whole truth, frankly and boldly. Nor need we shrink from honestly f
acing conditions in our country today. This great Nation will endure, as it has endured, will revive and will prosper. So, first of all, let me assert my firm belief that the only thing we h
ave to fear is fear itself, nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance. In every dark hour of our national life, a leadership of
 frankness and of vigor has met with that understanding and support of the people themselves which is essential to victory. And I am convinced that you will again give that support to leader
ship in these critical days."
>>> map(lambda word: word.lower(), paragraph.split()).count("the")
7

由于我的解决方案可能看起来很奇怪，这里从左到右稍微解释一下：

map(function, target)：这会将函数应用于的所有元素target，因此target必须是列表或其他一些可迭代的。在这种情况下，我们正在映射一个lambda函数，这可能有点吓人，所以请阅读下面的内容

.lower()：在这种情况下，采用其应用的任何字符串的小写word字母。这样做是为了确保“the”、“The”、“THE”、“TheE”等都被计算在内

.split(): 这会将字符串 ( paragraph) 通过括号中提供的分隔符拆分为列表。在没有分隔符的情况下（例如这个），一个空格被假定为分隔符。请注意，当分隔符被忽略时，顺序分隔符会被合并。

.count(item)item：这会计算其应用到的列表中的实例。请注意，这不是计算事物的最有效方法（如果您关心速度，则必须使用正则表达式）

可怕的 lambda 函数：

lambda 函数不容易解释或理解。我花了很长时间才了解它们是什么以及它们何时有用。我发现本教程很有帮助。

我对 tl;dr 的最佳尝试是 lambda 函数是小型匿名函数，可以方便地使用。我知道这充其量是不完整的，但我认为它应该足以满足这个问题的范围

score 1 · Accepted Answer

您的代码不起作用的原因是因为您编写了

if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):
# evaluates to: if word == "The" or "the":
# evaluates to: if False or "the":
# evaluates to: if "the":

代替

if (word == (chr(84)+chr(104)+chr(101))) or (word == (chr(116)+chr(104)+chr(101))):
# evaluates to: if (word == "The") or (word == "the")

'the'更重要的是，正如 Barmar 所指出的，使用字符串文字更具可读性。

所以你可能想要这样的东西：

count = 0
for word in default_sentence.split():
    if word == 'the' or word == 'The':
        count += 1

wnnmaw 有一个等效的单线，几乎同样有效。map(lambda word: word.lower())不太好用，因为根据 OP 的规范，我们只想计算'the'and 'The'，而不是'THE'.

score 1 · Accepted Answer

你可以这样做，使用正则表达式：

#!/usr/bin/env python
import re
input_string = raw_input("Enter your string: ");
print("Total occurences of the word 'the': %d"%(len(re.findall(r'\b(T|t)he\b', input_string)),));

如果您希望它不区分大小写，则re.findall可以将调用更改为re.findall(r'\bthe\b', input_string, re.I)

score 1 · Accepted Answer

问题是这一行：

if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):

大多数编程语言中的比较不能像英语那样缩写，你不能把“equal to A or B”写成“equal to A or equal to B”的缩写，你需要写出来：

if word == (chr(84)+chr(104)+chr(101)) or word == (chr(116)+chr(104)+chr(101)):

你写的内容被解析为：

if (word == (chr(84)+chr(104)+chr(101))) or (chr(116)+chr(104)+chr(101)):

由于 the 中的第二个表达式or始终为真（它是一个字符串，并且所有非空字符串都为真），因此if始终成功，因此您计算所有单词，而不仅仅是theand The。

也没有充分的理由使用这种冗长的chr()语法，只需编写：

if word == "the" or word == "The":

您的代码中还有其他错误。该split行应该是：

default_sentence = sentence2.split();

python - Python - 如何找出用户说出“the”或“The”这个词的次数

5 回答 5

Related

Reference