0

我有以下python代码:

import regex
original = " the  quick ' brown 1 fox! jumps-over the 'lazy' doG? !  "
s = [i for i in original.split(" ")]

我想编写一个名为的函数get_sentence,它接受一个元素s并将句子作为元素所属的字符串返回。例如:

"brown" ->  "the  quick ' brown 1 fox!"

如果第一个“the”被传递给函数,那么:

"the" -> the  quick ' brown 1 fox!"

如果第二个:

"the" -> "jumps-over the 'lazy' doG?"

你会将什么作为参数传递给这样的函数?在 C++ 中,我可能会传入一个 std::vector::const_iterator。在 CI 中会传入一个 int(数组索引),甚至可能是一个指针。

4

4 回答 4

2
>>> from itertools import product, chain
>>> #Assuming your original sentence is
>>> origional = " the  quick ' brown 1 fox! jumps-over the 'lazy' doG? !  "
>>> #Sentence terminators are
>>> sent_term = "[?!.;]"
>>> #I will use regex to split it into sentences
>>> re.split(sent_term, origional.strip())
["the  quick ' brown 1 fox", " jumps-over the 'lazy' doG", ' ', '']
>>> #And then split it as words
>>> #I could have used str.split, but that would include punctuations
>>> #Which you may not be interested
>>> #For each of the words, I create a mapping with the sentence using product
>>> word_map = ((product(re.split("\W",e),[e])) 
                 for e in re.split(sent_term, origional.strip()))
>>> #Chain it as a single list
>>> word_map = chain(*((product(re.split("\W",e),[e])) 
                        for e in re.split(sent_term, origional.strip())))
>>> from collections import defaultdict
>>> #Create a default dict
>>> words = defaultdict(list)
>>> #And populated all non trivial words
>>> for k, v in word_map:
    if k.strip():
        words[k]+=[v]


>>> words
defaultdict(<type 'list'>, {'brown': ["the  quick ' brown 1 fox"], 'lazy': [" jumps-over the 'lazy' doG"], 'jumps': [" jumps-over the 'lazy' doG"], 'fox': ["the  quick ' brown 1 fox"], 'doG': [" jumps-over the 'lazy' doG"], '1': ["the  quick ' brown 1 fox"], 'quick': ["the  quick ' brown 1 fox"], 'the': ["the  quick ' brown 1 fox", " jumps-over the 'lazy' doG"], 'over': [" jumps-over the 'lazy' doG"]})
>>> #Now to get the first word
>>> words['the'][0]
"the  quick ' brown 1 fox"
>>> #Now to get the second sentence
>>> words['the'][1]
" jumps-over the 'lazy' doG"
于 2013-01-30T16:41:12.043 回答
0

我不完全确定我理解你想要做什么,但你可能只会传递一个整数索引。您不能传递对的引用,the因为两者完全相同。

于 2013-01-30T16:19:16.520 回答
0

“Pythonic”的方式是构建一个字典,其中键是单词,值是句子,或者是一个包含键所属句子的列表。

lookup = {}
sentences = split_to_sentences(large_text)
for idx_sentence, sentence in enumerate(sentences):
    for word in split_to_words(sentence):
        if word in sentence:
            s = lookup.setdefault(word, set())
            s.add(idx_sentence)

现在lookup你有了一本字典,其中每个单词都分配了它出现的句子索引。顺便说一句,你可以用一些非常好的列表理解来重写它。

于 2013-01-30T16:22:44.607 回答
0

您可以通过对句子列表的字典索引来执行此操作:

import re
original = " the  quick ' brown 1 fox! jumps-over the 'lazy' doG? !  "

index={}

for sentence in re.findall(r'(\b.*?[.!?])',original):
    for word in re.findall(r'\w+',sentence):
        index.setdefault(word,[]).append(sentence)

print index

印刷:

{'brown': ["the  quick ' brown 1 fox!"], 'lazy': ["jumps-over the 'lazy' doG?"], 'jumps': ["jumps-over the 'lazy' doG?"], 'fox': ["the  quick ' brown 1 fox!"], 'doG': ["jumps-over the 'lazy' doG?"], '1': ["the  quick ' brown 1 fox!"], 'quick': ["the  quick ' brown 1 fox!"], 'the': ["the  quick ' brown 1 fox!", "jumps-over the 'lazy' doG?"], 'over': ["jumps-over the 'lazy' doG?"]}

第一个“the”由 表示,index['the'][0]第二个由index['the'][1]

于 2013-01-30T16:51:22.467 回答