1

当它们嵌入到这样的东西中时,<我试图获得它们的位置。>tag<tag "510270">calculate</>

我有这样的句子:

sentence = "After six weeks and seventeen tentative approaches the only serious 
tender came from Daniel. He had offered a paltry #2 a week for the one-time 
woodman's home, sane enough in this, at least, to <tag "510270">calculate</> 
safety to the nearest new penny piece. "

sentence2 = "After six weeks and seventeen tentative approaches the only serious 
tender came from Daniel. He had offered a paltry #2 a week for the one-time 
woodman's < home, sane enough in this, at least, to <tag "510270">calculate</> 
safety to the nearest new penny > piece. "

sentence3 = "After six weeks and seventeen tentative approaches the only serious 
tender came from Daniel. He had offered a paltry #2 a week for the one-time 
woodman's > home, sane enough in this, at least, to <tag "510270">calculate</> 
safety to the nearest new penny < piece. "

我需要 cfrom 和 incfrom 成为第一个和第二个<的位置,我需要 cto 和 incto<tag "XXXX">...</>成为第二个和第一个>的位置<tag "XXXX">...</>

<对于句子 2和>句子 3之类的句子,我怎么能做到这一点<tag "XXXX">...</>

对于 sentence1,我可以简单地这样做:

cfrom,cto = 0,0
for i,c in enumerate(sentence1):
  if c == "<":
    cfrom == i
  break

for i,c in enumerate(sentence1.reverse):
  if c == ">":
    cto == len(sentence)-i
  break

incfrom incto = 0,0
fromtrigger, totrigger = False, False
for i,c in enumerate(sentence1[cfrom:]):
  if c == ">":
    incfrom = cfrom+i
  break

for i,c in enumerate(sentence1[incfrom:cto]):
  if c == "<":
    incto = i
  break
4

2 回答 2

1

如果您在找到标签时跟踪您所在的位置,如下所示:

def parseSentence(sentence):
    cfrom, cto, incfrom, incto = 0, 0, 0, 0
    place = '' #to keep track of where we are

    for i in range(len(sentence)):
        c = sentence[i]
        if (c == '<'):
            #check for 'cfrom'
            if (sentence[i : i + 4] == '<tag'):
                cfrom = i
                place = 'botag' #begin-open-tag
            #check for 'incfrom'
            elif (sentence[i + 1] == '/' and place == 'intag'):
                incfrom = i
                place = 'bctag' #begin-close-tag
        elif (c == '>'):
            #check for 'cto'
            if (place == 'botag'): #just after '<tag...'
                cto = i
                place = 'intag' #now within the XML tag
            #check for 'incto'
            elif (place == 'bctag'):
                incto = i
                place = ''
                yield (cfrom, cto, incfrom, incto)

这应该适用于您的所有句子,但请注意,只有当您的句子中只有一个句子时,它才会真正正常工作<tag>...</>。如果有多个,它将​​返回最后一个的位置<tag>...</>

编辑:如果您将 a 添加到函数中,如果您有多个标签,yield它将遍历句子中所有标签的位置(见上文)。<tag>...</>

于 2013-06-05T16:06:28.990 回答
0

如果我理解正确,这应该有效(假设您不更改变量i ,c

cfrom,cto = 0,0
for i,c in enumerate(sentence1):
  if c == "<tag":
    cfrom == i 
  break

for i,c in enumerate(sentence1):
  if c == ">":
    cto == i \\going forward from cfrom
  break

incfrom incto = 0,0
fromtrigger, totrigger = False, False
for i,c in enumerate(sentence1[cto:]):\\after the tag is opened, look for the start of closing tag
  if c == "</":
    incfrom = i
  break
for i,c in enumerate(sentence1[cto:]):
  if c == ">":
    incto = i
  break
于 2013-06-05T15:25:26.990 回答