我想检查一个句子是否有拉长的单词。例如,soooo、tooooo、thaaatttt 等。现在我不知道用户可能会输入什么,因为我有一个句子列表,其中可能有也可能没有拉长的单词。我如何在python中检查它。我是 python 新手。
问问题
1386 次
4 回答
3
@HughBothwell 有一个好主意。据我所知,没有一个英文单词的同一个字母连续重复三遍。因此,您可以搜索执行此操作的单词:
>>> from re import search
>>> mystr = "word word soooo word tooo thaaatttt word"
>>> [x for x in mystr.split() if search(r'(?i)[a-z]\1\1+', x)]
['soooo,', 'tooo', 'thaaatttt']
>>>
您找到的任何内容都将是拉长的单词。
于 2013-11-24T01:50:42.670 回答
3
尝试这个:
import re
s1 = "This has no long words"
s2 = "This has oooone long word"
def has_long(sentence):
elong = re.compile("([a-zA-Z])\\1{2,}")
return bool(elong.search(sentence))
print has_long(s1)
False
print has_long(s2)
True
于 2013-11-24T01:50:12.990 回答
1
您需要有可用的有效英语单词的参考。在 *NIX 系统上,您可以使用/etc/share/dict/words
or/usr/share/dict/words
或等价物并将所有单词存储到一个set
对象中。
然后,您需要检查句子中的每个单词,
- 该词本身不是一个有效的词(即,
word not in all_words
);和 - 也就是说,当您将所有连续序列缩短为一两个字母时,新单词就是有效单词。
这是您尝试提取所有可能性的一种方法:
import re
import itertools
regex = re.compile(r'\w\1\1')
all_words = set(get_all_words())
def without_elongations(word):
while re.search(regex, word) is not None:
replacing_with_one_letter = re.sub(regex, r'\1', word, 1)
replacing_with_two_letters = re.sub(regex, r'\1\1', word, 1)
return list(itertools.chain(
without_elongations(replacing_with_one_letter),
without_elongations(replacing_with_two_letters),
))
for word in sentence.split():
if word not in all_words:
if any(map(lambda w: w in all_words, without_elongations(word)):
print('%(word) is elongated', { 'word': word })
于 2013-11-24T02:40:20.323 回答
1
好吧,您可以在逻辑上列出每个拉长的单词。然后遍历句子中的单词,然后遍历列表中的单词以找到拉长的单词。
sentence = "Hoow arre you doing?"
elongated = ["hoow",'arre','youu','yoou','meee'] #You will need to have a much larger list
for word in sentence:
word = word.lower()
for e_word in elongated:
if e_word == word:
print "Found an elongated word!"
如果你想做 Hugh Bothwell 所说的,那么:
sentence = "Hooow arrre you doooing?"
elongations = ["aaa","ooo","rrr","bbb","ccc"]#continue for all the letters
for word in sentence:
for x in elongations:
if x in word.lower():
print '"'+word+'" is an elongated word'
于 2013-11-24T01:40:58.357 回答