1

我有一个巨大的字符串元组,它们是从程序返回的。返回的示例元组可能如下所示:

('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')

可以将这些字符串转换为真正的元组(里面有整数),但我希望有人知道一个很好的技巧来加快速度。我想出的任何东西都感觉像是在以一种相对“缓慢”的方式进行。正如我所提到的,这些列表可能很大,因此非常感谢快速方法!

谢谢

编辑一个 好吧,所以看起来 eval 是一种较慢的方法。但到目前为止,我已经测试了 4 种方法,感谢您的任何评论和提交!:)

另外,有人问我的元组的大小。它的范围从几个到不超过几百万。不是“太大”,但足够大,速度是一个重要因素。我不是来进行微优化的,只是学习任何我可能不知道的新技巧。例如, eval() 是我经常忘记的东西,即使在这种情况下它似乎做得不太好。

编辑两个 我还想注意字符串格式不应该改变。所以不需要检查格式。此外,这是一个嵌入式 Python v2.6.2,所以任何需要2.6 的都可以。另一方面,3.0,没那么多;)

看起来很棒的家伙,再次感谢所有输入:)

编辑 3 另一个注释。我注意到我一直在返回没有导致“元组”的代码,这没关系,如果有人认为最终结果“必须”是一个元组,我很抱歉。类似格式的东西很好。

import timeit

test_tuple = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)', '(7,0)',)

def timeit_a():
    ''''''
    def convert_tup_strings(tup_string):
        first_int, last_int = tup_string[1:-1].split(',')
        return (int(first_int), int(last_int))

    return map(convert_tup_strings, test_tuple)

def timeit_a_1():
    ''''''
    def convert_tup_strings(tup_string):
        return map(int, tup_string[1:-1].split(','))

    return map(convert_tup_strings, test_tuple)

def timeit_b():
    converted = []

    for tup_string in test_tuple:
        first_int, last_int = tup_string[1:-1].split(',')
        converted.append((int(first_int), int(last_int)))

    return converted

def timeit_b_1():
    converted = []

    for tup_string in test_tuple:
        converted.append(map(int, tup_string[1:-1].split(',')))

    return converted

def timeit_c():
    ''''''
    return [eval(t) for t in test_tuple]

def timeit_d():
    ''''''
    return map(eval, test_tuple)

def timeit_e():
    ''''''
    return map(lambda a: tuple(map(int, a[1:-1].split(','))), test_tuple)

print 'Timeit timeit_a: %s' % timeit.timeit(timeit_a)
print 'Timeit timeit_a_1: %s' % timeit.timeit(timeit_a_1)
print 'Timeit timeit_b: %s' % timeit.timeit(timeit_b)
print 'Timeit timeit_b_1: %s' % timeit.timeit(timeit_b_1)
print 'Timeit timeit_c: %s' % timeit.timeit(timeit_c)
print 'Timeit timeit_d: %s' % timeit.timeit(timeit_d)
print 'Timeit timeit_e: %s' % timeit.timeit(timeit_e)

结果是:

Timeit timeit_a: 15.8954099772
Timeit timeit_a_1: 18.5484214589
Timeit timeit_b: 15.3137666465
Timeit timeit_b_1: 17.8405181116
Timeit timeit_c: 91.9587832802
Timeit timeit_d: 89.8858157489
Timeit timeit_e: 20.1564312947
4

8 回答 8

10

我不建议您使用 eval 。它缓慢且不安全。你可以这样做:

result = map(lambda a: tuple(map(int, a[1:-1].split(','))), s)

数字不言自明:

timeit.Timer("map(lambda a: tuple(map(int, a[1:-1].split(','))), s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)

1.8787779808044434

timeit.Timer("map(eval, s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)

11.571426868438721
于 2009-10-24T20:59:28.033 回答
3
map(eval, tuples)

This won't account for the case where one of the tuples isn't syntactically correct. For that, I'd recommend something like:

def do(tup):
    try: return eval(tup)
    except: return None

map(do, tuples)

Both methods tested for speed:

>>> tuples = ["(1,0)"] * 1000000

>>> # map eval
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
16.02 s

>>> # map do
>>> >>> st = time.time(); parsed = map(do, tuples); print "%.2f s" % (time.time() - st)
18.46 s

For 1,000,000 tuples that's not bad (but isn't great either). The overhead, presumably, is in parsing Python one million times by using eval. However, it is the easiest way to do what you're after.

The answer using list comprehension instead of map is about as slow as my try/except case (interesting in itself):

>>> st = time.time(); parsed = [eval(t) for t in tuples]; print "%.2f s" % (time.time() - st)
18.13 s

All that being said, I'm going to venture premature optimization is at work here -- parsing strings is always slow. How many tuples are you expecting?

于 2009-10-24T20:39:58.650 回答
2

我的电脑比 Nadia 的慢,但是它运行得更快

>>> timeit.Timer(
    "list((int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s))", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.2250211238861084

比这个

>>> timeit.Timer(
    "map(lambda a: tuple(map(int, a[1:-1].split(','))), s)", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.8979239463806152

使用列表理解仍然更快

>>> timeit.Timer(
    "[(int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s)]", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
2.452484130859375
于 2009-10-25T18:10:30.040 回答
2

如果你知道格式,我会做字符串解析。比 eval() 快。

>>> tuples = ["(1,0)"] * 1000000
>>> import time
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
32.71 s
>>> def parse(s) :
...   return s[1:-1].split(",")
...
>>> parse("(1,0)")
['1', '0']
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
5.05 s

如果你需要整数

>>> def parse(s) :
...   return map(int, s[1:-1].split(","))
...
>>> parse("(1,0)")
[1, 0]
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
9.62 s
于 2009-10-24T20:54:02.633 回答
1

You can get a parser up and running pretty quickly with YAPPS.

于 2009-10-24T20:34:10.300 回答
1

you can just use yaml or json to parse it into tuples for you.

于 2009-10-24T20:44:09.920 回答
1

如果您确定输入格式正确:

tuples = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')
result = [eval(t) for t in tuples]
于 2009-10-24T20:23:02.993 回答
0
import ast

list_of_tuples = map(ast.literal_eval, tuple_of_strings)
于 2009-10-25T18:53:47.280 回答