对于其他对此仍然感兴趣的人:关键是从文本的末尾向后移动(如此处所述)。如果这样做,您只需比较已经记住的元素。
说,words
是要根据 包装的字符串列表textwidth
。然后,在讲座的符号中,任务减少到三行代码:
import numpy as np
textwidth = 80
DP = [0]*(len(words)+1)
for i in range(len(words)-1,-1,-1):
DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)])
和:
def badness(line,textwidth):
# Number of gaps
length_line = len(line) - 1
for word in line:
length_line += len(word)
if length_line > textwidth: return float('inf')
return ( textwidth - length_line )**3
他提到可以添加第二个列表来跟踪中断位置。您可以通过将代码更改为:
DP = [0]*(len(words)+1)
breaks = [0]*(len(words)+1)
for i in range(len(words)-1,-1,-1):
temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)]
index = np.argmin(temp)
# Index plus position in upper list
breaks[i] = index + i + 1
DP[i] = temp[index]
要恢复文本,只需使用中断位置列表:
def reconstruct_text(words,breaks):
lines = []
linebreaks = []
i = 0
while True:
linebreaks.append(breaks[i])
i = breaks[i]
if i == len(words):
linebreaks.append(0)
break
for i in range( len(linebreaks) ):
lines.append( ' '.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() )
return lines
结果: ( text = reconstruct_text(words,breaks)
)
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
人们可能会想添加一些空格。这非常棘手(因为可能会提出各种审美规则),但天真的尝试可能是:
import re
def spacing(text,textwidth,maxspace=4):
for i in range(len(text)):
length_line = len(text[i])
if length_line < textwidth:
status_length = length_line
whitespaces_remain = textwidth - status_length
Nwhitespaces = text[i].count(' ')
# If whitespaces (to add) per whitespace exeeds
# maxspace, don't do anything.
if whitespaces_remain/Nwhitespaces > maxspace-1:pass
else:
text[i] = text[i].replace(' ',' '*( 1 + int(whitespaces_remain/Nwhitespaces)) )
status_length = len(text[i])
# Periods have highest priority for whitespace insertion
periods = text[i].split('.')
# Can we add a whitespace behind each period?
if len(periods) - 1 + status_length <= textwidth:
text[i] = '. '.join(periods).strip()
status_length = len(text[i])
whitespaces_remain = textwidth - status_length
Nwords = len(text[i].split())
Ngaps = Nwords - 1
if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain
# List of whitespaces in line i
gaps = re.findall('\s+', text[i])
temp = text[i].split()
for k in range(Ngaps):
temp[k] = ''.join([temp[k],gaps[k]])
for j in range(whitespaces_remain):
if status_length >= textwidth:pass
else:
replace = temp[int(factor*j)]
replace = ''.join([replace, " "])
temp[int(factor*j)] = replace
text[i] = ''.join(temp)
return text
什么给你:(text = spacing(text,textwidth)
)
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.