python - Possible to use wide-character members in Python extension objects?

Question

It's simple to create a member for an object in a Python C extension with a base type of char *, using the T_STRING define in the PyMemberDef declaration.

Why does there not seem to be an equivalent for wchar_t *? And if there actually is one, what is it?

e.g.

struct object contains char *text

PyMemberDef array has {"text", T_STRING, offsetof(struct object, text), READONLY, "This is a normal character string."}

versus something like

struct object contains wchar_t *wtext

PyMemberDef array has {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a wide character string"}

I understand that something like PyUnicode_AsString() and its related methods can be used to encode the data in UTF-8, store it in a basic char string, and decode later, but doing it that way would require wrapping the generic getattr and setattr methods/functions with ones that account for the encoded text, and it's not very useful when you want character arrays of fixed element size within a struct and don't want the effective number of characters that can be stored in it to vary.

score 2 · Accepted Answer

直接使用 awchar_t是不可移植的。相反，Python 将Py_UNICODE类型定义为 Unicode 字符的存储单元。

根据平台的不同，Py_UNICODE可以定义为wchar_t可用，或者定义为无符号短/整数/长，其宽度将根据 Python 的配置方式（UCS2 与 UCS4）以及所使用的体系结构和 C 编译器而有所不同。您可以在中找到相关定义unicodeobject.h。

对于您的用例，您的对象可以具有一个 Unicode 字符串的属性，使用T_OBJECT：

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

您可以在对象的初始化程序中执行类型检查：

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

如果您需要遍历 Unicode 字符串中的低级字符，有一个宏会返回Py_UNICODE *：

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...

python - Possible to use wide-character members in Python extension objects?

1 回答 1

Related

Reference