python - 使用偏移量和长度更改具有多个值的子字符串

Question

我正在从 OpenCalais API 中提取数据，以下是详细信息：

输入：某个段落（一个字符串，例如“Barack Obama 是美国总统。”此外，返回的是一些具有偏移量和长度但不一定按出现顺序的实例变量。

输出（我想要）：相同的字符串，但带有超链接的已识别实体实例（这也是一个字符串），即

output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"

但这确实是一个 Python 问题。

这就是我所拥有的

#API CALLS ABOVE WHICH IS NOT RELEVANT. 

output=input
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])
    previdx=0
    idx=0
    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']

        except:
            url="https://en.wikipedia.org/wiki/"+result.entities[x]    ["name"].replace(" ", "_")

        print "Generating wiki page link"
        print url+"\n"

 #THE PROBLEM STARTS HERE

         offsetstr=result.entities[x]["instances"][y]["offset"]
         lenstr=result.entities[x]["instances"][y]["length"]

         output=output[:offsetstr]+"<a href=" + url + ">" +   output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]

print output

现在的问题是，如果您正确阅读代码，您会知道在第一次迭代之后，输出字符串会发生变化 - 因此对于后续迭代，偏移值不再以相同的方式应用。所以，我无法做出预期的改变。

基本上试图得到：

input = "Barack Obama is the President of United States"

output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>."

怎么可能，我想知道。尝试拼接 n 切片，但字符串会出现乱码。

score 0 · Accepted Answer

尝试使用另一个 var 来存储结果

output=input
res,preOffsetstr  = [],0
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])
    previdx=0
    idx=0
    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']

        except:
            url="https://en.wikipedia.org/wiki/"+result.entities[x]    ["name"].replace(" ", "_")

        print "Generating wiki page link"
        print url+"\n"

 #THE PROBLEM STARTS HERE

         offsetstr=result.entities[x]["instances"][y]["offset"]
         lenstr=result.entities[x]["instances"][y]["length"]

         res.append(output[preOffsetstr :offsetstr]+"<a href=" + url + ">" +      output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])


         preOffsetstr = offsetstr
print '\n'.join(res)

score 0 · Accepted Answer

我终于解决了。做了一些主要的数学逻辑，但作为我最后的直觉评论 - “也许一个解决方案可以将 {offset, length} 元组存储在一个数组中，然后根据偏移值对其进行排序，然后运行循环。任何帮助做那个结构？” - 成功了。

output=input
l=[]
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])

    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url=r'"'+ "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid'] + r'"'

        except:
            url=r'"'+"https://en.wikipedia.org/wiki/"+result.entities[x]["name"].replace(" ", "_") + r'"'

        print "Generating wiki page link"

 #THE PROBLEM WAS HERE 

        offsetstr=result.entities[x]["instances"][y]["offset"]
        lenstr=result.entities[x]["instances"][y]["length"]

#The KEY TO THE SOLUTION IS HERE
        l.append((offsetstr,lenstr,url))
       # res.append(output[preOffsetstr:offsetstr]+"<a href=" + url + ">" +      output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])

print l

def getKey(item):
    return item[0]

l_sorted=sorted(l, key=getKey)


a=[]
o=[]
x=0
p=0
#And then simply run a for loop

for x in range(0,len(l_sorted)):
    p=x+1
    try:
        o=output[l_sorted[x][0]+l_sorted[x][1]:l_sorted[x][0]] + "<a href=" + str(l_sorted[x][2]) + ">" +  output[l_sorted[x][0]:(l_sorted[x][0]+l_sorted[x][1])] + "</a>" + output[l_sorted[x][0]+l_sorted[x][1]:(l_sorted[p][0]-1)]
        a.append(o)
    except:
        print ""

#+ output[l_sorted[x][0]+l_sorted[x][1]:]
#a.append(output[l_sorted[len(l_sorted)][0]] + l_sorted[len(l_sorted)][1]:l_sorted[len(l_sorted)][0]] + "<a href=" + str(l_sorted[len(l_sorted)][2]) + ">" + output[l_sorted[len(l_sorted)][0]:(l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1])] + "</a>" + output[l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1]:]
m=output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:l_sorted[len(l_sorted)-1][0]] + "<a href=" + str(l_sorted[len(l_sorted)-1][2]) + ">" +  output[l_sorted[len(l_sorted)-1][0]:(l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1])] + "</a>" + output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:]
a.append(m)

print " ".join(a)

还有 WALLAH！:) - 感谢各位的帮助。希望有一天它可以帮助某人。

python - 使用偏移量和长度更改具有多个值的子字符串

2 回答 2

Related

Reference