有多少个单词包含重复 3 次的两个字母序列?例如,“contentment”和“maintaining”就是这样的词,因为“contentment”的序列“nt”重复了3次,而“maintaining”的序列“in”重复了3次。
这是我的代码:
len([f for f in file if re.match(r'(.*?[a-z]{2}.*?){3}',f)])
有多少个单词包含重复 3 次的两个字母序列?例如,“contentment”和“maintaining”就是这样的词,因为“contentment”的序列“nt”重复了3次,而“maintaining”的序列“in”重复了3次。
这是我的代码:
len([f for f in file if re.match(r'(.*?[a-z]{2}.*?){3}',f)])
这是一个简单的正则表达式:
.*(\w{2}).*\1.*\1
它捕获一个组中的两个字母 ,(\w{2})
然后具有相同字母的同一组必须再出现两次\1
。
这是一个实际的例子:
import re
text = """
How many words contain some two-letter sequence repeated 3 times? For example, "contentment" and "maintaining" are such words because "contentment" has the sequence "nt" repeated three times and "maintaining" has the sequence "in" repeated three times.
"""
def check(word):
return re.match(r".*(\w{2}).*\1.*\1", word)
def main():
for word in text.split():
if check(word):
print(word)
main()
您可以使用
\b(?=\w*(\w{2})(?:\w*\1){2})\w+
请参阅正则表达式演示。
细节
\b
- 单词边界(?=\w*(\w{2})(?:\w*\1){2})
- 紧随其后的是 0+ 字字符,然后将两个字字符捕获到第 1 组中,然后必须有两次重复任何 0+ 字字符后跟与第 1 组中相同的值\w+
- 消耗一个或多个单词字符。请参阅Python 演示:
import re
text = "contentment and maintaining are such words"
print ( [x.group() for x in re.finditer(r'\b(?=\w*(\w{2})(?:\w*\1){2})\w+', text)] )
# => ['contentment', 'maintaining']
print ( len([x.group() for x in re.finditer(r'\b(?=\w*(\w{2})(?:\w*\1){2})\w+', text)]) )
# => 2