3

在我的Python Utilities Github 存储库中,我有一个函数可以从字符串、映射和序列中删除非打印字符和无效的 Unicode 字节:

def filterCharacters(s):
    """
    Strip non printable characters

    @type s dict|list|tuple|bytes|string
    @param s Object to remove non-printable characters from

    @rtype dict|list|tuple|bytes|string
    @return An object that corresponds with the original object, nonprintable characters removed.
    """

    validCategories = (
        'Lu', 'Ll', 'Lt', 'LC', 'Lm', 'Lo', 'L', 'Mn', 'Mc', 'Me', 'M', 'Nd', 'Nl', 'No', 'N', 'Pc',
        'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Po', 'P', 'Sm', 'Sc', 'Sk', 'So', 'S', 'Zs', 'Zl', 'Zp', 'Z'
    )
    convertToBytes = False

    if isinstance(s, dict):
        new = {}
        for k,v in s.items(): # This is the offending line
            new[k] = filterCharacters(v)
        return new

    if isinstance(s, list):
        new = []
        for item in s:
            new.append(filterCharacters(item))
        return new

    if isinstance(s, tuple):
        new = []
        for item in s:
            new.append(filterCharacters(item))
        return tuple(new)

    if isinstance(s, bytes):
        s = s.decode('utf-8')
        convertToBytes = True

    if isinstance(s, str):
        s = ''.join(c for c in s if unicodedata.category(c) in validCategories)
        if convertToBytes:
            s = s.encode('utf-8')
        return s

    else:
        return None

有时这个函数会抛出异常:

Traceback (most recent call last):
  File "./util.py", line 56, in filterCharacters
    for k,v in s.items():
RuntimeError: dictionary changed size during iteration

我没有看到我在哪里更改作为参数发送的字典。那为什么会抛出这个异常呢?

谢谢!

4

1 回答 1

3

在 python 3中dict.items()返回dict_view对象(不像list在 python 2 中那样)。浏览 CPython 代码,我注意到类似的评论

Objects/dictobject.c

dict_items(register PyDictObject *mp) 
{
    ...
    /* Preallocate the list of tuples, to avoid allocations during
     * the loop over the items, which could trigger GC, which
     * could resize the dict. :-(
     */
    ...

    if (n != mp->ma_used) {
        /* Durnit.  The allocations caused the dict to resize.
         * Just start over, this shouldn't normally happen.
         */
        Py_DECREF(v);
        goto again;
    }
    ...
}

因此,不仅 dict 删除和插入可能会导致显示此错误,而且任何分配都可能导致显示!哦!

调整大小的过程也很有趣。看着

static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
    ...
}

但这都是内部因素。

解决方案

尝试转换dict_viewlistwith

if isinstance(s, dict):
    new = {}
    items = [i for i in s.items()]
    for k,v in items:
        new[k] = filterCharacters(v)
    return new
于 2013-07-19T18:55:48.430 回答