142

我想让PyYAML的加载器将映射(和有序映射)加载到 Python 2.7+ OrderedDict类型中,而不是 vanilladict和它当前使用的对列表。

最好的方法是什么?

4

8 回答 8

173

蟒蛇> = 3.6

在 python 3.6+ 中,似乎默认情况下保留了dict加载顺序,没有特殊的字典类型。另一方面,默认的Dumper按键对字典进行排序。从 开始pyyaml 5.1,您可以通过传递来关闭它sort_keys=False

a = dict(zip("unsorted", "unsorted"))
s = yaml.safe_dump(a, sort_keys=False)
b = yaml.safe_load(s)

assert list(a.keys()) == list(b.keys())  # True

这可以工作,因为新的 dict 实现已经在 pypy 中使用了一段时间。虽然在 CPython 3.6 中仍被视为实现细节,但从 3.7+ 开始,“字典的插入顺序保留性质已被声明为 Python 语言规范的官方部分”,请参阅Python 3.7 中的新增功能

请注意,这在 PyYAML 方面仍然没有记录,因此对于安全关键型应用程序,您不应依赖它。

原始答案(与所有已知版本兼容)

我喜欢@James 的简单解决方案。但是,它会更改默认的全局yaml.Loader类,这可能会导致麻烦的副作用。特别是在编写库代码时,这是一个坏主意。此外,它不能直接与yaml.safe_load().

幸运的是,可以毫不费力地改进解决方案:

import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

# usage example:
ordered_load(stream, yaml.SafeLoader)

对于序列化,您可以使用以下函数:

def ordered_dump(data, stream=None, Dumper=yaml.SafeDumper, **kwds):
    class OrderedDumper(Dumper):
        pass
    def _dict_representer(dumper, data):
        return dumper.represent_mapping(
            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
            data.items())
    OrderedDumper.add_representer(OrderedDict, _dict_representer)
    return yaml.dump(data, stream, OrderedDumper, **kwds)

# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)

在每种情况下,您还可以使自定义子类全局化,这样就不必在每次调用时重新创建它们。

于 2014-02-20T15:47:31.933 回答
58

2018年选项:

oyamlPyYAML的替代品,它保留了 dict 排序。支持 Python 2 和 Python 3。只需pip install oyaml, 并导入如下所示:

import oyaml as yaml

倾倒/加载时,您将不再因搞砸的映射而烦恼。

注意:我是oyaml的作者。

于 2018-02-21T08:06:29.310 回答
57

yaml 模块允许您指定自定义“表示器”以将 Python 对象转换为文本,并允许您指定“构造器”以反转该过程。

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG

def dict_representer(dumper, data):
    return dumper.represent_dict(data.iteritems())

def dict_constructor(loader, node):
    return collections.OrderedDict(loader.construct_pairs(node))

yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)
于 2014-01-10T15:26:03.057 回答
28

2015(及以后)选项:

ruamel.yaml是 PyYAML 的替代品(免责声明:我是该软件包的作者)。保留映射顺序是 2015 年第一个版本 (0.1) 中添加的内容之一。它不仅保留了字典的顺序,还保留了注释、锚名称、标签,并且支持 YAML 1.2规范(2009 年发布)

规范说不能保证顺序,但是 YAML 文件中当然有顺序,适当的解析器可以保留它并透明地生成一个保持顺序的对象。您只需要选择正确的解析器、加载器和转储器¹:

import sys
from ruamel.yaml import YAML

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)

会给你:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

data是一种CommentedMap功能类似于 dict 的类型,但有额外的信息会一直保留直到被转储(包括保留的评论!)

于 2015-06-10T18:02:07.317 回答
15

注意:有一个基于以下答案的库,它还实现了 CLoader 和 CDumpers:Phynix/yamlloader

我非常怀疑这是不是最好的方法,但这是我想出的方法,它确实有效。也可作为要点

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping
于 2011-02-25T19:55:12.770 回答
11

更新:该库已被弃用,取而代之的是yamlloader(基于 yamlordereddictloader)

我刚刚找到了一个 Python 库(https://pypi.python.org/pypi/yamlordereddictloader/0.1.1),它是根据这个问题的答案创建的,使用起来非常简单:

import yaml
import yamlordereddictloader

datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)
于 2016-02-20T13:26:09.570 回答
3

在我为 Python 2.7 安装的 For PyYaml 中,我更新了 __init__.py、constructor.py 和 loader.py。现在支持加载命令的 object_pairs_hook 选项。我所做的更改差异如下。

__init__.py

$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)

constructor.py

$ diff constructor.py Original
20,21c20
<     def __init__(self, object_pairs_hook=dict):
<         self.object_pairs_hook = object_pairs_hook
---
>     def __init__(self):
27,29d25
<     def create_object_hook(self):
<         return self.object_pairs_hook()
<
54,55c50,51
<         self.constructed_objects = self.create_object_hook()
<         self.recursive_objects = self.create_object_hook()
---
>         self.constructed_objects = {}
>         self.recursive_objects = {}
129c125
<         mapping = self.create_object_hook()
---
>         mapping = {}
400c396
<         data = self.create_object_hook()
---
>         data = {}
595c591
<             dictitems = self.create_object_hook()
---
>             dictitems = {}
602c598
<             dictitems = value.get('dictitems', self.create_object_hook())
---
>             dictitems = value.get('dictitems', {})

loader.py

$ diff loader.py Original
13c13
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
18c18
<         BaseConstructor.__init__(self, **constructKwds)
---
>         BaseConstructor.__init__(self)
23c23
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
28c28
<         SafeConstructor.__init__(self, **constructKwds)
---
>         SafeConstructor.__init__(self)
33c33
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
38c38
<         Constructor.__init__(self, **constructKwds)
---
>         Constructor.__init__(self)
于 2013-08-25T21:48:01.947 回答
-1

这是一个简单的解决方案,它还检查地图中重复的顶级键。

import yaml
import re
from collections import OrderedDict

def yaml_load_od(fname):
    "load a yaml file as an OrderedDict"
    # detects any duped keys (fail on this) and preserves order of top level keys
    with open(fname, 'r') as f:
        lines = open(fname, "r").read().splitlines()
        top_keys = []
        duped_keys = []
        for line in lines:
            m = re.search(r'^([A-Za-z0-9_]+) *:', line)
            if m:
                if m.group(1) in top_keys:
                    duped_keys.append(m.group(1))
                else:
                    top_keys.append(m.group(1))
        if duped_keys:
            raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
    # 2nd pass to set up the OrderedDict
    with open(fname, 'r') as f:
        d_tmp = yaml.load(f)
    return OrderedDict([(key, d_tmp[key]) for key in top_keys])
于 2015-07-06T16:47:31.807 回答