2

It's simple to create a member for an object in a Python C extension with a base type of char *, using the T_STRING define in the PyMemberDef declaration.

Why does there not seem to be an equivalent for wchar_t *? And if there actually is one, what is it?

e.g.

struct object contains char *text

PyMemberDef array has {"text", T_STRING, offsetof(struct object, text), READONLY, "This is a normal character string."}

versus something like

struct object contains wchar_t *wtext

PyMemberDef array has {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a wide character string"}

I understand that something like PyUnicode_AsString() and its related methods can be used to encode the data in UTF-8, store it in a basic char string, and decode later, but doing it that way would require wrapping the generic getattr and setattr methods/functions with ones that account for the encoded text, and it's not very useful when you want character arrays of fixed element size within a struct and don't want the effective number of characters that can be stored in it to vary.

4

1 回答 1

2

直接使用 awchar_t是不可移植的。相反,Python 将Py_UNICODE类型定义为 Unicode 字符的存储单元。

根据平台的不同,Py_UNICODE可以定义为wchar_t可用,或者定义为无符号短/整数/长,其宽度将根据 Python 的配置方式(UCS2 与 UCS4)以及所使用的体系结构和 C 编译器而有所不同。您可以在 中找到相关定义unicodeobject.h

对于您的用例,您的对象可以具有一个 Unicode 字符串的属性,使用T_OBJECT

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

您可以在对象的初始化程序中执行类型检查:

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

如果您需要遍历 Unicode 字符串中的低级字符,有一个宏会返回Py_UNICODE *

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...
于 2011-06-01T03:01:51.613 回答