我正在从 OpenCalais API 中提取数据,以下是详细信息:
输入:某个段落(一个字符串,例如“Barack Obama 是美国总统。”此外,返回的是一些具有偏移量和长度但不一定按出现顺序的实例变量。
输出(我想要):相同的字符串,但带有超链接的已识别实体实例(这也是一个字符串),即
output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"
但这确实是一个 Python 问题。
这就是我所拥有的
#API CALLS ABOVE WHICH IS NOT RELEVANT.
output=input
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
previdx=0
idx=0
for y in range(0,len(result.entities[x]["instances"])):
try:
url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']
except:
url="https://en.wikipedia.org/wiki/"+result.entities[x] ["name"].replace(" ", "_")
print "Generating wiki page link"
print url+"\n"
#THE PROBLEM STARTS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
output=output[:offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]
print output
现在的问题是,如果您正确阅读代码,您会知道在第一次迭代之后,输出字符串会发生变化 - 因此对于后续迭代,偏移值不再以相同的方式应用。所以,我无法做出预期的改变。
基本上试图得到:
input = "Barack Obama is the President of United States"
output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>."
怎么可能,我想知道。尝试拼接 n 切片,但字符串会出现乱码。