python - 使用正则表达式查找字符相同或不同的单词

Question

我有一个单词列表，例如：

l = """abca
bcab
aaba
cccc
cbac
babb
"""

我想找到第一个和最后一个字符相同的单词，并且两个中间字符与第一个/最后一个字符不同。

期望的最终结果：

['abca', 'bcab', 'cbac']

我试过这个：

re.findall('^(.)..\\1$', l, re.MULTILINE)

但它也会返回所有不需要的单词。我想以某种方式使用 [^...] ，但我想不通。有一种方法可以使用集合（从上面的搜索中过滤结果），但我正在寻找一个正则表达式。

是否可以？

score 3 · Accepted Answer

有很多方法可以做到这一点。这里可能是最简单的：

re.findall(r'''
           \b          #The beginning of a word (a word boundary)
           ([a-z])     #One letter
           (?!\w*\1\B) #The rest of this word may not contain the starting letter except at the end of the word
           [a-z]*      #Any number of other letters
           \1          #The starting letter we captured in step 2
           \b          #The end of the word (another word boundary)
           ''', l, re.IGNORECASE | re.VERBOSE)

如果需要，您可以通过替换[a-z]为\w. 这将允许数字和下划线以及字母。您还可以通过将*模式中的最后一个更改为来将其限制为 4 个字符的单词{2}。

另请注意，我对 Python 不是很熟悉，所以我假设您的使用findall是正确的。

score 3 · Accepted Answer

编辑：修复使用否定的前瞻断言而不是否定的后瞻断言。阅读@AlanMoore 和@bukzor 解释的评论。

>>> [s for s in l.splitlines() if re.search(r'^(.)(?!\1).(?!\1).\1$', s)]
['abca', 'bcab', 'cbac']

该解决方案使用否定前瞻断言，这意味着“仅当当前位置后面没有匹配其他内容时才匹配当前位置。” 现在，看一下前瞻断言 - (?!\1). 这意味着“仅当当前字符后面没有第一个字符时才匹配当前字符。”

score 1 · Accepted Answer

见鬼的正则表达式。

[
    word
    for word in words.split('\n')
    if word[0] == word[-1]
    and word[0] not in word[1:-1]
]

score 1 · Accepted Answer

你需要使用正则表达式吗？这是一种更 Pythonic 的方式来做同样的事情：

l = """abca
bcab
aaba
cccc
cbac
babb
"""

for word in l.split():
  if word[-1] == word[0] and word[0] not in word[1:-1]:
     print word

score 1 · Accepted Answer

这是我的做法：

result = re.findall(r"\b([a-z])(?:(?!\1)[a-z]){2}\1\b", subject)

这类似于贾斯汀的答案，除了那个做一次前瞻的地方，这个人会在每个字母被消耗时检查它。

\b
([a-z])  # Capture the first letter.
(?:
  (?!\1)   # Unless it's the same as the first letter...
  [a-z]    # ...consume another letter.
){2}
\1
\b

我不知道你的真实数据是什么样的，所以[a-z]随意选择，因为它适用于你的样本数据。出于同样的原因，我将长度限制为四个字符。与贾斯汀的回答一样，您可能想要更改{2}to或其他一些量词。*+

score 0 · Accepted Answer

您可以使用否定的前瞻或后瞻断言来做到这一点；有关详细信息，请参阅http://docs.python.org/library/re.html。

score 0 · Accepted Answer

不是 Python 大师，但也许这个

re.findall('^(.)(?:(?!\1).)*\1$', l, re.MULTILINE)

扩展（使用多行修饰符）：

^                # begin of line
  (.)            # capture grp 1, any char except newline
  (?:            # grouping
     (?!\1)         # Lookahead assertion, not what was in capture group 1 (backref to 1)
     .              # this is ok, grab any char except newline
  )*             # end grouping, do 0 or more times (could force length with {2} instead of *)
  \1             # backref to group 1, this character must be the same
$                # end of line

python - 使用正则表达式查找字符相同或不同的单词

7 回答 7

Related

Reference