python - 在 Python NLTK 中使用一致性来返回字典中找到的键的值

Question

我想使用索引查找文本中单词或短语的实例，然后在字典中查找找到的单词/短语并返回相应的值。这是我到目前为止的代码。

from __future__ import division
import nltk, re, pprint
OutFileName = "shark_uri.txt"
OutFile = open(OutFileName, 'w')
book1 = open('shark_test.txt', 'rU').read() 
token1 = nltk.word_tokenize(book1)
text1 = nltk.Text(token1)
LineNumber = 0
for k, v in bio_dict.iteritems(): 
        text1.concordance(k)
    #if k is found then print v, else go on to next k
    if k #is found:
        OutFile.write(v)
        OutFile.write('\n')
        LineNumber += 1
    else
        LineNumber += 1
OutFile.close()

这段代码应该是在 Shark_test.txt 文件中读取关于鲨鱼的一段。bio_dict 包含这样的键值对

'ovoviviparous':'http://dbpedia.org/resource/Ovoviviparity', 
'predator':'http://dbpedia.org/resource/Predation',

键代表程序正在查找的单词或短语。该值是对应于单词/短语的 DBpedia URI。这个想法是，当在文本中找到像“predator”这样的词时，程序将返回 Predation 的 DBpedia URI。我得到了很多奇怪的结果，我认为这是因为我需要告诉程序如果找到 k 则返回 v，否则转到下一个 k。我在上面的代码块中为此放置了一个占位符。我不太清楚如何用 Python 来表达这个。如果 k == True 会是这样吗？如果没有这个条件，它似乎只是通过字典打印所有值，而不管是否找到键。有什么建议吗？提前致谢。

score 1 · Accepted Answer

您的代码现在的工作方式是遍历bio_dict字典中的所有键、值对，然后concordance用于打印存在的text1行k。这里要注意的重要一点是 usingconcordance 不返回任何内容，而只是打印。因此，即使您尝试使用返回值（实际上并没有在您的代码中），您也不能。当您编写时if k:，这将始终是True- 假设您的键是非空字符串（没有键评估为False）。

如果我正确理解你的问题，你真的不应该使用concordance。相反，请执行以下操作：

for word in token1:                        # Go through every word in your text
    if word in bio_dict:                   # Check if the word is in the dict
        OutFile.write(bio_dict[word]+'\n') # Output the value to your file

此外，您的LineNumber计数器实际上并没有计算您想要的内容，因为您正在一次读取输入文件并将整个内容标记为token1. 但是由于您实际上并未使用LineNumber，因此您可以删除该变量并仍然获得所需的输出。

score -1 · Accepted Answer

我设法用这段代码得到了我需要的东西。

from __future__ import division
import urllib
import re, pprint, time
in_file_name = "shark_id.txt"
in_file = open(in_file_name, 'r')
out_file_name = "shark_uri.txt"
out_file = open(out_file_name, 'w')

for line in in_file:                                                    
line = line.strip()                                             
address = 'http://eol.org/api/data_objects/1.0/' + line + '.xml'    
web_content = urllib.urlopen(address)                           
results = web_content.read().lower()                                        
temp_file_name = "Temp_file.xml"                                    
temp_file = open(temp_file_name, 'w')                               
temp_file.write(results)    
temp_file.close()                                           
print line
print len(results)              
temp_file = open('Temp_file.xml')
data = temp_file.read()
temp_file.close()
for k, v in bio_dict.iteritems():                           
    if k in data:                       
        out_file.write(line + ',')                                  
        out_file.write(k + ',')                                 
        out_file.write(v)                                       
        out_file.write('\n')                                        
time.sleep(.5)
in_file.close()                                                     
out_file.close()

python - 在 Python NLTK 中使用一致性来返回字典中找到的键的值

2 回答 2

Related

Reference