Python中将字符串中的每个字母加倍(或重复n次)的最有效方法是什么?
"abcd" -> "aabbccdd"
或者
"abcd" -> "aaaabbbbccccdddd"
我有一个长字符串需要以这种方式进行变异,当前的解决方案涉及一个循环,n
每个字母都连接起来,我想这会更有效。
使用str.join
:
>>> strs = "abcd"
>>> "".join([x*2 for x in strs])
'aabbccdd'
>>> "".join([x*4 for x in strs])
'aaaabbbbccccdddd'
来自文档:
s = ""
for substring in list:
s += substring
改为使用s = "".join(list)
。前者在构建大字符串时是一个非常常见和灾难性的错误。
既然您特别询问了效率:
# drewk's answer, optimized by using from_iterable instead of *
def double_chain(s):
return ''.join(chain.from_iterable(zip(s, s)))
# Ashwini Chaudhary's answer
def double_mult(s):
return ''.join([x*2 for x in s])
# Jon Clements' answer, but optimized to take the re.compile and *2 out of the loop.
r = re.compile('(.)')
def double_re(s):
return r.sub(r'\1\1', s)
现在:
In [499]: %timeit double_chain('abcd')
1000000 loops, best of 3: 1.99 us per loop
In [500]: %timeit double_mult('abcd')
1000000 loops, best of 3: 1.25 us per loop
In [501]: %timeit double_re('abcd')
10000 loops, best of 3: 22.2 us per loop
因此,该itertools
方法比最简单的方法慢了大约 60%,而使用正则表达式仍然慢了一个数量级以上。
但是像这样的小字符串可能不代表更长的字符串,所以:
In [504]: %timeit double_chain('abcd' * 10000)
100 loops, best of 3: 4.92 ms per loop
In [505]: %timeit double_mult('abcd' * 10000)
100 loops, best of 3: 5.57 ms per loop
In [506]: %timeit double_re('abcd' * 10000)
10 loops, best of 3: 91.5 ms per loop
正如预期的那样,该itertools
方法变得更好(现在击败了简单的方法),而正则表达式随着字符串的变长变得更糟。
因此,没有一种“最有效”的方式。如果您将数十亿个微小的字符串加倍,Ashwini 的答案是最好的。如果您将数百万个或数千个大弦加倍,则使用drawk's 是最好的。如果你什么都不做……就没有理由一开始就优化它。
另外,通常的警告:这个测试是在我的 Mac 上的 64 位 CPython 3.3.0,没有负载;不保证您的应用程序中的 Python 实现、版本和平台与您的真实数据相同。使用 32 位 2.6 进行的快速测试显示了类似的结果,但如果这很重要,您需要自己运行更真实和相关的测试。
我本来会选择的str.join
,所以我会提供re
一个替代方案:
>>> s = "abcd"
>>> import re
>>> re.sub('(.)', r'\1' * 2, s)
'aabbccdd'
每当问题是:“将字符串的每个字符映射到其他字符的最有效方法是什么”结果证明str.translate
是最好的选择......对于足够大的字符串:
def double_translate(s):
return s.translate({ord(x):2*x for x in set(s)})
针对其他答案的时间安排:
In [5]: %timeit double_chain('abcd')
The slowest run took 11.03 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 992 ns per loop
In [6]: %timeit double_chain('mult')
The slowest run took 13.61 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1 µs per loop
In [7]: %timeit double_mult('abcd')
The slowest run took 7.59 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 869 ns per loop
In [8]: %timeit double_re('abcd')
The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.4 µs per loop
In [9]: %timeit double_translate('abcd')
The slowest run took 5.80 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.78 µs per loop
In [10]: %%timeit t='abcd'*5000
...: double_chain(t)
...:
1000 loops, best of 3: 1.66 ms per loop
In [11]: %%timeit t='abcd'*5000
...: double_mult(t)
...:
100 loops, best of 3: 2.35 ms per loop
In [12]: %%timeit t='abcd'*5000
...: double_re(t)
...:
10 loops, best of 3: 30 ms per loop
In [13]: %%timeit t='abcd'*5000
...: double_translate(t)
...:
1000 loops, best of 3: 1.03 ms per loop
但是请注意,此解决方案还有一个额外的优势,即在某些情况下,您可以避免重新构建要传递给的表translate
,例如:
def double_translate_opt(s, table=None):
if table is None:
table = {ord(x):2*x for x in set(s)}
return s.translate(table)
这将避免一些开销,使其更快:
In [19]: %%timeit t='abcd'; table={ord(x):2*x for x in t}
...: double_translate_opt(t, table)
...:
The slowest run took 17.59 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 452 ns per loop
正如您使用小字符串看到的那样,它的速度是当前答案的两倍,前提是您避免每次都构建表格。对于长文本,构建表格的成本以翻译速度偿还(set
在这些情况下使用是值得的,以避免多次调用ord
)。
def double_letter(str):
strReturn = ''
for chr in str:
strReturn += chr*n
return strReturn
def crazy(words):
return "".join([letter * 2 for letter in words])
def crazy(words,n):
return "".join([letter * n for letter in words])
// function call
print(crazy("arpan",3))
输出 :
啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊