以下代码显示FMc的代码不起作用。
该行
from name_of_file import olding_the_new,finding
引用了我在此线程中对此问题的个人回答中的代码。
*name_of_file
为包含我的代码脚本的文件命名(位于我在此线程中的另一个答案中),它将运行。
* 或者,如果您不喜欢复制粘贴我的代码,只需注释这行导入,下面的代码就会运行,因为我放置了一个 try-except 指令,该指令将正确地对缺少olding_the_new
和finding
我使用两种方法来验证FMc代码的结果:
-1/ 将他的代码返回的跨度与'f' 的索引和'r' 的索引进行比较,因为我们搜索短语 'foobar' 并且我在那里管理除了foobar -2/中的那些之外,没有f和r
与我的代码返回的第一个跨度相比,因此需要上述导入 from
name_of_file
诺塔贝内
如果disp = None
更改为disp == True
,则执行显示有助于理解算法的中间结果。
.
import re
from name_of_file import olding_the_new,finding
def main():
# Two versions of the text: the original,
# and one without any of the "#" markers.
for text_orig in ('BLAH ##BLAH fo####o##ba###r## BL##AH',
'jkh##jh#f',
'#f#oo##ba###r##',
'a##xf#oo##ba###r##',
'ax##f#oo##ba###r##',
'ab###xyf#oo##ba###r##',
'abx###yf#oo##ba###r##',
'abxy###f#oo##ba###r##',
'iji#hkh#f#oo##ba###r##',
'mn##pps#f#oo##ba###r##',
'mn##pab###xyf#oo##ba###r##',
'lmn#pab###xyf#oo##ba###r##',
'fo##o##ba###r## aaaaaBLfoob##arAH',
'fo#o##ba####r## aaaaaBLfoob##ar#AH',
'f##oo##ba###r## aaaaaBLfoob##ar',
'f#oo##ba####r## aaaaBL#foob##arAH',
'f#oo##ba####r## aaaaBL#foob##ar#AH',
'foo##ba#####r## aaaaBL#foob##ar',
'#f#oo##ba###r## aaaBL##foob##arAH',
'#foo##ba####r## aaaBL##foob##ar#AH',
'#af#oo##ba##r## aaaBL##foob##ar',
'##afoo##ba###r## aaaaaBLfoob##arAH',
'BLAHHfo##o##ba###r aaBLfoob##ar#AH',
'BLAH#fo##o##ba###r aaBLfoob##ar',
'BLA#Hfo##o##ba###r###BLfoob##ar',
'BLA#Hfo##o##ba###r#BL##foob##ar',
):
text_clean = text_orig.replace('#', '')
# Collect data on the positions and widths
# of the markers in the original text.
rgx = re.compile(r'#+')
markers = [(m.start(), len(m.group()))
for m in rgx.finditer(text_orig)]
# Find the location of the search phrase in the cleaned text.
# At that point you'll have all the data you need to compute
# the span of the phrase in the original text.
search = 'foobar'
try:
i = text_clean.index(search)
print ('text_clean == %s\n'
"text_clean.index('%s')==%d len('%s') == %d\n"
'text_orig == %s\n'
'markers == %s'
% (text_clean,
search,i,search,len(search),
text_orig,
markers))
S,E = compute_span(i, len(search), markers)
print "span = (%d,%d) %s %s %s"\
% (S,E,
text_orig.index('f')==S,
text_orig.index('r')+1==E,
list(finding(search,text_orig,'#+')))
except ValueError:
print ('text_clean == %s\n'
"text_clean.index('%s') ***Not found***\n"
'text_orig == %s\n'
'markers == %s'
% (text_clean,
search,
text_orig,
markers))
print '--------------------------------'
.
def compute_span(start, width, markers):
# start and width are in expurgated text
# markers are in original text
disp = None # if disp==True => displaying of intermediary results
span_start = start
if disp:
print ('\nAt beginning in compute_span():\n'
' span_start==start==%d width==%d'
% (start,width))
for s, w in markers: # s and w are in original text
if disp:
print ('\ns,w==%d,%d'
' s+w-1(%d)<start(%d) %s'
' s(%d)==start(%d) %s'
% (s,w,s+w-1,start,s+w-1<start,s,start,s==start))
if s + w - 1 < start:
#mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmmwmwmwmwmwm
# the following if-else section is justified to be used
# only after correction of the above line to this one:
# if s+w-1 <= start or s==start:
#mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
if s + w - 1 <= start and disp:
print ' 1a) s + w - 1 (%d) <= start (%d) marker at left'\
% (s+w-1, start)
elif disp:
print ' 1b) s(%d) == start(%d)' % (s,start)
#mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmmwmwmwmwmwm
# Situation: marker fully to left of our text.
# Adjust our start points rightward.
start += w
span_start += w
if disp:
print ' span_start == %d start, width == %d, %d' % (span_start, start, width)
elif start + width - 1 < s:
if disp:
print (' 2) start + width - 1 (%d) < s (%d) marker at right\n'
' break' % (start+width-1, s))
# Situation: marker fully to the right of our text.
break
else:
# Situation: marker interrupts our text.
# Advance the start point for the remaining text
# rightward, and reduce the remaining width.
if disp:
print " 3) In 'else': s - start == %d marker interrupts" % (s - start)
start += w
width = width - (s - start)
if disp:
print ' span_start == %d start, width == %d, %d' % (span_start, start, width)
return (span_start, start + width)
.
main()
结果
>>>
text_clean == BLAH BLAH foobar BLAH
text_clean.index('foobar')==10 len('foobar') == 6
text_orig == BLAH ##BLAH fo####o##ba###r## BL##AH
markers == [(5, 2), (14, 4), (19, 2), (23, 3), (27, 2), (32, 2)]
span = (12,26) True False [(12, 27)]
--------------------------------
text_clean == jkhjhf
text_clean.index('foobar') ***Not found***
text_orig == jkh##jh#f
markers == [(3, 2), (7, 1)]
--------------------------------
text_clean == foobar
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == #f#oo##ba###r##
markers == [(0, 1), (2, 1), (5, 2), (9, 3), (13, 2)]
span = (0,11) False False [(1, 13)]
--------------------------------
text_clean == axfoobar
text_clean.index('foobar')==2 len('foobar') == 6
text_orig == a##xf#oo##ba###r##
markers == [(1, 2), (5, 1), (8, 2), (12, 3), (16, 2)]
span = (2,16) False True [(4, 16)]
--------------------------------
text_clean == axfoobar
text_clean.index('foobar')==2 len('foobar') == 6
text_orig == ax##f#oo##ba###r##
markers == [(2, 2), (5, 1), (8, 2), (12, 3), (16, 2)]
span = (2,15) False False [(4, 16)]
--------------------------------
text_clean == abxyfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == ab###xyf#oo##ba###r##
markers == [(2, 3), (8, 1), (11, 2), (15, 3), (19, 2)]
span = (4,19) False True [(7, 19)]
--------------------------------
text_clean == abxyfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == abx###yf#oo##ba###r##
markers == [(3, 3), (8, 1), (11, 2), (15, 3), (19, 2)]
span = (4,18) False False [(7, 19)]
--------------------------------
text_clean == abxyfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == abxy###f#oo##ba###r##
markers == [(4, 3), (8, 1), (11, 2), (15, 3), (19, 2)]
span = (4,19) False True [(7, 19)]
--------------------------------
text_clean == ijihkhfoobar
text_clean.index('foobar')==6 len('foobar') == 6
text_orig == iji#hkh#f#oo##ba###r##
markers == [(3, 1), (7, 1), (9, 1), (12, 2), (16, 3), (20, 2)]
span = (7,18) False False [(8, 20)]
--------------------------------
text_clean == mnppsfoobar
text_clean.index('foobar')==5 len('foobar') == 6
text_orig == mn##pps#f#oo##ba###r##
markers == [(2, 2), (7, 1), (9, 1), (12, 2), (16, 3), (20, 2)]
span = (7,18) False False [(8, 20)]
--------------------------------
text_clean == mnpabxyfoobar
text_clean.index('foobar')==7 len('foobar') == 6
text_orig == mn##pab###xyf#oo##ba###r##
markers == [(2, 2), (7, 3), (13, 1), (16, 2), (20, 3), (24, 2)]
span = (9,24) False True [(12, 24)]
--------------------------------
text_clean == lmnpabxyfoobar
text_clean.index('foobar')==8 len('foobar') == 6
text_orig == lmn#pab###xyf#oo##ba###r##
markers == [(3, 1), (7, 3), (13, 1), (16, 2), (20, 3), (24, 2)]
span = (9,24) False True [(12, 24)]
--------------------------------
text_clean == foobar aaaaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == fo##o##ba###r## aaaaaBLfoob##arAH
markers == [(2, 2), (5, 2), (9, 3), (13, 2), (27, 2)]
span = (0,9) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == fo#o##ba####r## aaaaaBLfoob##ar#AH
markers == [(2, 1), (4, 2), (8, 4), (13, 2), (27, 2), (31, 1)]
span = (0,7) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaaaBLfoobar
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == f##oo##ba###r## aaaaaBLfoob##ar
markers == [(1, 2), (5, 2), (9, 3), (13, 2), (27, 2)]
span = (0,11) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == f#oo##ba####r## aaaaBL#foob##arAH
markers == [(1, 1), (4, 2), (8, 4), (13, 2), (22, 1), (27, 2)]
span = (0,8) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == f#oo##ba####r## aaaaBL#foob##ar#AH
markers == [(1, 1), (4, 2), (8, 4), (13, 2), (22, 1), (27, 2), (31, 1)]
span = (0,8) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaaBLfoobar
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == foo##ba#####r## aaaaBL#foob##ar
markers == [(3, 2), (7, 5), (13, 2), (22, 1), (27, 2)]
span = (0,7) True False [(0, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == #f#oo##ba###r## aaaBL##foob##arAH
markers == [(0, 1), (2, 1), (5, 2), (9, 3), (13, 2), (21, 2), (27, 2)]
span = (0,11) False False [(1, 13), (23, 31)]
--------------------------------
text_clean == foobar aaaBLfoobarAH
text_clean.index('foobar')==0 len('foobar') == 6
text_orig == #foo##ba####r## aaaBL##foob##ar#AH
markers == [(0, 1), (4, 2), (8, 4), (13, 2), (21, 2), (27, 2), (31, 1)]
span = (0,12) False False [(1, 13), (23, 31)]
--------------------------------
text_clean == afoobar aaaBLfoobar
text_clean.index('foobar')==1 len('foobar') == 6
text_orig == #af#oo##ba##r## aaaBL##foob##ar
markers == [(0, 1), (3, 1), (6, 2), (10, 2), (13, 2), (21, 2), (27, 2)]
span = (2,10) True False [(2, 13), (23, 31)]
--------------------------------
text_clean == afoobar aaaaaBLfoobarAH
text_clean.index('foobar')==1 len('foobar') == 6
text_orig == ##afoo##ba###r## aaaaaBLfoob##arAH
markers == [(0, 2), (6, 2), (10, 3), (14, 2), (28, 2)]
span = (1,14) False True [(3, 14), (24, 32)]
--------------------------------
text_clean == BLAHHfoobar aaBLfoobarAH
text_clean.index('foobar')==5 len('foobar') == 6
text_orig == BLAHHfo##o##ba###r aaBLfoob##ar#AH
markers == [(7, 2), (10, 2), (14, 3), (27, 2), (31, 1)]
span = (5,14) True False [(5, 18), (23, 31)]
--------------------------------
text_clean == BLAHfoobar aaBLfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == BLAH#fo##o##ba###r aaBLfoob##ar
markers == [(4, 1), (7, 2), (10, 2), (14, 3), (27, 2)]
span = (4,16) False False [(5, 18), (23, 31)]
--------------------------------
text_clean == BLAHfoobarBLfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == BLA#Hfo##o##ba###r###BLfoob##ar
markers == [(3, 1), (7, 2), (10, 2), (14, 3), (18, 3), (27, 2)]
span = (5,14) True False [(5, 18), (23, 31)]
--------------------------------
text_clean == BLAHfoobarBLfoobar
text_clean.index('foobar')==4 len('foobar') == 6
text_orig == BLA#Hfo##o##ba###r#BL##foob##ar
markers == [(3, 1), (7, 2), (10, 2), (14, 3), (18, 1), (21, 2), (27, 2)]
span = (5,14) True False [(5, 18), (23, 31)]
--------------------------------
>>>
.
---------------------------------------------
FMc的代码很微妙,我花了很长时间才明白它的原理,然后才能够纠正它。
我会让任何人理解算法的任务。我只说使FMc的代码正常工作所需的更正:
.
第一次更正:
if s + w - 1 < start:
# must be changed to
if s + w - 1 <= start or (s==start):
编辑
在我最初的回答中,
我写了... or (s<=start)
.
那是我的错误,其实我是有意写的
.. or (s==start)
关于此编辑的 NOTA BENE:
这个错误在用我在这里描述的两个更正更正的代码中没有任何后果,以更正FMc的初始代码(第一个,因为目前他已经更改了两次)。
事实上,如果你用这两个更正来更正代码,你将获得正确的结果,所有 25 个例子都是 for text_orig
,以及... or (s <= start)
with ... or (s==start)
。
所以我认为s < start
当第一个条件s+w-1 <= start
为 False 时,永远不会发生 True 的情况,这可能是基于w
始终大于 0 的事实以及由于标记和非标记序列的配置而导致的其他一些原因...... ..
所以我试图找到这种印象的示范......但我失败了。
此外,我达到了一种我什至不再了解FMc算法的状态(他做任何编辑之前的第一个算法)!
尽管如此,我还是让这个答案保持原样,并在这个答案的末尾发布了试图解释为什么需要这些更正的解释。
但我警告:FMc的第一个算法非常古怪且难以理解,因为它会比较属于两个不同字符串的索引,一个是带有标记 #### 的 text_orig,另一个是清除了所有这些标记... ..现在我不再相信这可能有道理....
.
第二次更正:
start += w
width = width - (s - start)
# must be changed to
width -= (s-start) # this line MUST BE before the following one
start = s + w # because start += (s-start) + w
------------------
我很惊讶有 2 个人支持 FMc 的答案,尽管它给出了错误的代码。这意味着他们在没有测试给定代码的情况下对答案进行了投票。
--------------------------------------
.
编辑
为什么必须将条件if s + w - 1 < start:
更改为这个:
if s + w - 1 <= start or (s==start):
?
因为它可能会发生 s + w - 1 < start
应该是 False 和s
equalsstart
在一起。
在这种情况下,执行转到该else
部分并执行(在更正的代码中):
width -= (s - start)
start = s + w
因此,width
当我们看到相关序列时,它显然应该改变,但不会改变。
这种情况可能发生在检查第一个标记时,如以下序列:
'#f#oo##ba###r##' : s,w==0,1 , 0==s==start==0
'ax##f#oo##ba###r##' : s,w==2,2 , 2==s==start==2
'abxy###f#oo##ba###r##' : s,w==4,3 , 4==s==start==4
'#f#oo##ba###r## aaaBL##foob##arAH' : s,w==0,1 , 0==s==start==0
'BLAH#fo##o##ba###r aaBLfoob##ar' : s,w==4,1 4==s==start==4
对于以下情况,它发生在第二个标记的检查中:
'iji#hkh#f#oo##ba###r##' : s,w==7,1 , 7==s==start==7
'mn##pps#f#oo##ba###r##' : s,w==7,1 , 7==s==start==7
通过设置执行我的代码可以更好地理解它disp = True
。
当被验证时,可能相等s + w - 1 <= start
的事实并不麻烦,因为执行不会进入该部分,它会进入仅添加to和 to的第一个部分。
但是当is False while equals时,执行会转到指令执行不会改变任何宽度值的部分,这很麻烦。
因此,必须添加条件来阻止此目的地,并且需要将其放在 an 之后以阻止此目的地,即使是 False,这可能会发生,如一些示例所示。s
start
else
w
s
start
s + w - 1 <= start
s
start
else
width -= (s-start)
or (s==start)
else
or
s+w-1 <= start
.
关于s+w-1 < start
必须将指令更改为s+w-1 <= start
(带=)的事实,
这是因为仅w==1
对应于1个字符的大小写# only ,
对于大小写
mn##pps#f#oo##ba###r##
(第二个标记)
和BLAH#fo##o##ba###r
(第一个标记)。