0

我有一个字符串:

A = "{user_id:34dd833,category:secondary,items:camera,type:sg_ser}"

我需要将其转换为 python 字典,以便:

A = {"user_id":"34dd833", "category": "secondary", "items": "camera", "type": "sg_ser"}

除此之外,还有两个问题:

1:“items”键应该有多个值,例如:

A = {"user_id":34dd833, "category": "secondary", "items": "camera,vcr,dvd", "type": "sg_ser"}

这显然以字符串的形式出现:

A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"

因此,基于逗号分隔来概括任何东西都变得毫无用处。

2:字符串的顺序也可以是随机的。所以,字符串也可以是这样的:

A = "{category:secondary,type:sg_ser,user_id:34dd833,items:camera,vcr,dvd}"

这使得任何按顺序假设变薄的过程都是错误的。

在这种情况下该怎么办?非常感谢。

4

2 回答 2

7

If we can assume that your input doesn't do any quoting or escaping (your example doesn't, but that doesn't necessarily mean it's a good assumption), and that you can never have comma-separated multiple keys, just multiple values (which probably is a good assumption, because otherwise the format is ambiguous…):

First, let's drop the braces, then split on colons:

>>> A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
>>> A[1:-1].split(':')
['user_id', '34dd833,category', 'secondary,items', 'camera,vcr,dvd,type', 'sg_ser']

So the first entry is the first key, the last entry is the last value(s), and every entry in between is the Nth value(s) followed by a comma followed by the N+1th key. There may be other commas there, but the last one always splits the Nth value(s) from the N+1th key. (And that even works for N=0—there are no commas, so the last comma splits nothing from the 0th key. But it doesn't work for the very last entry, unfortunately. I'll get to that later.)

There are ways we could make this brief, but let's write that out explicitly as code first, so you understand how it works.

>>> d = {}
>>> entries = A[1:-1].split(':')
>>> for i in range(len(entries)-1):
...     key = entries[i].rpartition(',')[-1]
...     value = entries[i+1].rpartition(',')[0]
...     d[key] = value

This is almost right:

>>> d
{'category': 'secondary', 'items': 'camera,vcr,dvd', 'type': '', 'user_id': '34dd833'}

As mentioned above, it doesn't work for the last one. It should be obvious why; if not, see what rpartition(',') returns for the last value. You can patch that up manually, or just cheat by packing an extra , on the end (entries = (A[1:-1] + ',').split(':')). But if you think about it, if you just rsplit instead of rpartition, then [0] does the right thing. So let's do that instead.

So, how can we clean this up a bit?

First let's transform entries into a list of adjacent pairs. Now, each for each pair (n, nplus1), n.rpartition(',')[-1] is the key, and nplus1.rsplit(',', 1)[0] is the corresponding value. So:

>>> A = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
>>> entries = A[1:-1].split(':')
>>> adjpairs = zip(entries, entries[1:])
>>> d = {k.rpartition(',')[-1]: v.rsplit(',', 1)[0] for k, v in adjpairs}
于 2013-08-09T22:59:46.793 回答
2

这是另一种方式(不是特别健壮,但在示例数据上表明它是可能的):

import re
text = "{user_id:34dd833,category:secondary,items:camera,vcr,dvd,type:sg_ser}"
print dict(re.findall(r'(\w+):(.*?)(?=(?:,\w+:)|$)', text.strip('{}')))
# {'category': 'secondary', 'items': 'camera,vcr,dvd', 'user_id': '34dd833', 'type': 'sg_ser'}
于 2013-08-09T23:31:36.967 回答