python - Python 'string' % [1, 2, 3] 不会引发 TypeError

Question

是str.__mod__记录的确切行为吗？

这两行代码按预期工作：

>>> 'My number is: %s.' % 123
'My number is: 123.'
>>> 'My list is: %s.' % [1, 2, 3]
'My list is: [1, 2, 3].'

该行的行为也符合预期：

>>> 'Not a format string' % 123
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting

但是这条线是什么，为什么它不会引发任何错误？

>>> 'Not a format string' % [1, 2, 3]
'Not a format string'

附言

>>> print(sys.version)
3.3.2 (default, Aug 15 2013, 23:43:52) 
[GCC 4.7.3]

score 9 · Accepted Answer

我认为可以在 CPython 源代码中找到负责的行，我得到了 git v3.8.2：

在函数中

PyObject *
PyUnicode_Format(PyObject *format, PyObject *args)

在Objects/unicodeobject.c第 14944 行，有以下几行

Objects/unicodeobject.c, 第 15008 行

if (ctx.argidx < ctx.arglen && !ctx.dict) {
    PyErr_SetString(PyExc_TypeError,
                    "not all arguments converted during string formatting");
    goto onError;
}

arglen如果不匹配，这将给出错误，但如果ctx.dict为“true”，则不会给出错误。什么时候是“真的”？

Objects/unicodeobject.c, 第 14976 行

if (PyMapping_Check(args) && !PyTuple_Check(args) && !PyUnicode_Check(args))
    ctx.dict = args;
else
    ctx.dict = NULL;

好的，PyMapping_Check检查传递的args，如果那是“真”，并且我们没有元组或 unicode 字符串，我们设置ctx.dict = args.

做什么PyMapping_Check？

Objects/abstract.c，第 2110 行

int
PyMapping_Check(PyObject *o)
{
    return o && o->ob_type->tp_as_mapping &&
        o->ob_type->tp_as_mapping->mp_subscript;
}

据我了解，如果该对象可以用作“映射”，并且可以被索引/下标，这将返回1. 在这种情况下，的值ctx.dict将设置为args，即!0，因此不会进入错误情况。

两者dict和list都可以用作此类映射，因此在用作参数时不会引发错误。tuple在第 14976 行的检查中明确排除，可能是因为它用于将可变参数传递给格式化程序。

我不清楚这种行为是否或为什么是故意的，但源代码中的部分未注释。

基于此，我们可以尝试：

assert 'foo' % [1, 2] == 'foo'
assert 'foo' % {3: 4} == 'foo'
class A:
    pass
assert 'foo' % A() == 'foo'
# TypeError: not all arguments converted during string formatting
class B:
    def __getitem__(self):
        pass
assert 'foo' % B() == 'foo'

__getitem__因此，对象定义一个不触发错误的方法就足够了。

编辑：在v3.3.2OP 中引用的中，违规行是同一文件中的第 13922、13459 和 1918 行，逻辑看起来相同。

EDIT2：在v3.0中，检查在 8841 和 9226 行中Objects/unicodeobject.c，在 Unicode 格式代码中尚未使用PyMapping_Checkfrom 。Objects/abstract.c

EDIT3：根据一些二分法和 git blame，核心逻辑（在 ASCII 字符串上，而不是 unicode 字符串上）可以追溯到 Python 1.2，并由 GvR 自己在 25 年前实现：

commit caeaafccf7343497cc654943db09c163e320316d
Author: Guido van Rossum <guido@python.org>
Date:   Mon Feb 27 10:13:23 1995 +0000

    don't complain about too many args if arg is a dict

diff --git a/Objects/stringobject.c b/Objects/stringobject.c
index 7df894e12c..cb76d77f68 100644
--- a/Objects/stringobject.c
+++ b/Objects/stringobject.c
@@ -921,7 +921,7 @@ formatstring(format, args)
                        XDECREF(temp);
                } /* '%' */
        } /* until end */
-       if (argidx < arglen) {
+       if (argidx < arglen && !dict) {
                err_setstr(TypeError, "not all arguments converted");
                goto error;
        }

可能 GvR 可以告诉我们为什么这是预期的行为。

score 5 · Accepted Answer

添加最新的 printf 样式格式后，格式中似乎出现了很多小怪癖%。今天（3.8 版），这在此处进行了记录，但已在此处提及 3.3 版。

此处描述的格式化操作表现出各种怪癖，这些怪癖会导致许多常见错误（例如无法正确显示元组和字典）。使用较新的格式化字符串文字、str.format()接口或模板字符串可能有助于避免这些错误。这些替代方案中的每一个都提供了它们自己的权衡和简单性、灵活性和/或可扩展性的好处。

在这种特定情况下，Python 看到一个非元组值，__getitem__其右侧有一个方法，%并假设 aformat_map必须完成。这通常使用 a 来完成dict，但确实可以使用任何具有__getitem__方法的对象来完成。

特别是，format_map允许 a 忽略未使用的键，因为您通常不会遍历映射项来访问它们。

>>> "Include those items: %(foo)s %(bar)s" % {"foo": 1, "bar": 2, "ignored": 3}
'Include those items: 1 2'

您的示例是使用该功能，其中容器的所有键都被忽略。

>>> "Include no items:" % {"foo": 1, "bar": 2}
'Include no items:'

如果您想进一步证明这一点，请检查当您使用 alist作为右手边时会发生什么。

>>> lst = ["foo", "bar", "baz"]
>>> "Include those items: %(0)s, %(2)s" % lst
TypeError: list indices must be integers or slices, not str

Python 确实试图得到lst["0"]，不幸的是没有办法指定"0"应该转换为int，所以这注定要在%语法上失败。

旧版本

作为记录，这似乎是在 Python 3.0 之前出现的一个怪癖，因为我尽可能地得到相同的行为，尽管文档开始只在 3.3 版中提到它。

Python 3.0.1+ (unknown, May  5 2020, 09:41:19) 
[GCC 9.2.0] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> 'Not a format string' % [1, 2, 3]
'Not a format string'

python - Python 'string' % [1, 2, 3] 不会引发 TypeError

2 回答 2

旧版本

Related

Reference