3

This relates to my previous question: Converting from nested lists to a delimited string

I have an external service that sends data to us in a delimited string format. It is lists of items, up to 3 levels deep. Level 1 is delimited by '|'. Level 2 is delimited by ';' and level 3 is delimited by ','. Each level or element can have 0 or more items. An simplified example is:
a,b;c,d|e||f,g|h;;

We have a function that converts this to nested lists which is how it is manipulated in Python.

def dyn_to_lists(dyn):  
    return [[[c for c in b.split(',')] for b in a.split(';')] for a in dyn.split('|')]

For the example above, this function results in the following:

>>> dyn = "a,b;c,d|e||f,g|h;;"
>>> print (dyn_to_lists(dyn))
[[['a', 'b'], ['c', 'd']], [['e']], [['']], [['f', 'g']], [['h'], [''], ['']]]

For lists, at any level, with only one item, we want it as a scalar rather than a 1 item list. For lists that are empty, we want them as just an empty string. I've came up with this function, which does work:

def dyn_to_min_lists(dyn):
    def compress(x): 
        return "" if len(x) == 0 else x if len(x) != 1 else x[0]

    return compress([compress([compress([item for item in mv.split(',')]) for mv in attr.split(';')]) for attr in dyn.split('|')])

Using this function and using the example above, it returns (*see update below):

[[['a', 'b'], ['c', 'd']], 'e', '', ['f', 'g'], ['h', '', '']]

Being new to Python, I'm not confident this is the best way to do it. Are there any cleaner ways to handle this?

This will potentially have large amounts of data passing through it, are there any more efficient/scalable ways to achieve this?

Update

I found a bug in my original compress function. When an inner list has more than 1 item, the outer list cannot be removed - this would result in the conversion being non-reversible. For this, I've updated @Blender's compress function to be:

def __compress(x): 
    if len(x) > 1:
        return x
    elif not x:
        return ''
    else:
        if type(x[0]) != list:
            return x[0]
        else:
            return x

Now it returns the correct output of:

[[['a', 'b'], ['c', 'd']], 'e', '', [['f', 'g']], ['h', '', '']]
4

1 回答 1

1

你可以做一些事情来加速它:

  • 摆脱最里面的列表理解:[item for item in mv.split(',')]成为mv.split(','). 没用的。
  • 将函数移到compress函数之外dyn_to_min_lists。您不希望每次运行时都创建它dyn_to_min_lists
  • 使用真实性比调用更快len,所以替换len(x) == 0not x.
  • 重新排序compress函数的条件以便首先出现更常见的情况也会加快速度。

所以生成的代码是:

def compress(x): 
    if len(x) > 1:
        return x
    elif not x:
        return ''
    else:
        return x[0]

def parse(s):
    return compress([
        compress([
            compress(b.split(',')) for b in a.split(';')
        ]) for a in s.split('|')
    ])

这是速度比较:

>>> %timeit parse('a,b;c,d|e||f,g|h;;')
100000 loops, best of 3: 10 µs per loop
>>> %timeit dyn_to_min_lists('a,b;c,d|e||f,g|h;;')
10000 loops, best of 3: 15.6 µs per loop

它在我的电脑上快了大约 36%。如果这是脚本中非常关键的部分,请在 C 中实现它并将其编译为 C 扩展。

于 2013-10-27T04:48:32.180 回答