python - Python：使用 .isalpha() 计算字数中的特定单词/字符

Question

我创建了一个可以计算文本文件中特定单词或字符的函数。

但我想创建一个条件，其中函数只计算一个被字母包围的字符。例如在文本文件中。

'This test is an example, this text doesn't have any meaning. It is only an example.'

如果我要通过我的函数运行此文本，测试撇号 (') 的计数，它将返回 3。但是我希望它返回 1，仅适用于 2 个字母字符内的撇号（例如不是或不会)，但我希望它忽略没有被字母包围的所有其他撇号，例如单引号。

我尝试使用 .isalpha() 方法，但语法有问题。

score 0 · Accepted Answer

如果您只是想打折包含字符串本身的引号，最简单的方法可能是strip在计算之前将那些引号放在字符串之外。

>>> text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
>>> text.strip("'").count("'")
1

另一种方法是使用正则表达式，例如\w'\w，字母，然后是'，然后是字母：

>>> sum(1 for _ in re.finditer("\w'\w", text))
1

这也适用于字符串内的引号：

>>> text = "Text that has a 'quote' in it."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

但它也会错过后面没有另一个字母的撇号：

>>> text = "All the houses' windows were broken."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

score 0 · Accepted Answer

正如 xnx 已经指出的，正确的方法是使用正则表达式：

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

print(len(re.findall("[a-zA-Z]'[a-zA-Z]", text)))
"""
Out:
    1
"""

这里模式中的撇号被一组英文字母包围，但是有许多预定义的字符集，有关详细信息，请参阅RE 文档。

score 0 · Accepted Answer

你应该只使用正则表达式：

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

wordWrappedApos = re.compile(r"\w'\w")
found = re.findall(wordWrappedApos, text)
print(found)
print(len(found))

如果要确保其中没有数字，请用“\w”替换“[A-Za-z]”。

score 0 · Accepted Answer

我认为正则表达式对此会更好，但如果你必须使用isalpha，比如：

s = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
sum(s[i-1].isalpha() and s[i]=="'" and s[i+1].isalpha() for i in range(1,len(s)-1))

返回 1。

python - Python：使用 .isalpha() 计算字数中的特定单词/字符

4 回答 4

Related

Reference