python - Python 查找单词有特定的后缀

Question

我正在处理中文 NLP 问题。我发现找到的单词有特定的后缀。例如，我有两个列表！

suffixs = ['aaa','bbb','cc'.....]

words_list = ['oneaaa','twobbb','three','four']

for w in words_list:
    if w has suffix in suffixs:
          func(s,w)

我知道我可以使用 re 包，但 re 只能处理不到 100 个后缀，但我有 1000+ 后缀。我尝试使用

for w in words_list:
    for s in suffixs:
         #suffixs sorted by lenth
         if s is_suffix_of(w):
               func(s,w)
               break

但它太慢了。
func(s,w) 可以将单词w拆分为 no_suffix 单词和后缀。
例如 'oneaaa' 到 ['one','aaa']，但是 func 基于一些条件并且更复杂。所以 any在这里不起作用。
所以我想知道是否有更好的方法来处理它。

score 1 · Accepted Answer

如果你只是想看看哪些词有“后缀”（正确的词是后缀，顺便说一句），你可以str.endswith结合使用any

for w in words_list:
    if any(w.endswith(b) for b in back_fixs):
          print(w)

或将所有后缀传递给endswith，但为此它们必须在 a 中tuple，而不是list：

back_fixs = tuple(back_fixs)
for w in words_list:
    if w.endswith(back_fixs):
          print(w)

如果您还需要知道哪个后缀匹配，您可以获得next, 或者None如果不匹配：

for w in words_list:
    b = next((b for b in back_fixs if w.endswith(b)), None)
    if b:
          print(w, b)

或更短的使用filter：b = next(filter(w.endswith, back_fixs), None)

或者没有默认值，使用try/except：

for w in words_list:
    try:
        print(w, next(filter(w.endswith, back_fixs)))
    except StopIteration:
        pass

python - Python 查找单词有特定的后缀

1 回答 1

Related

Reference