python - 如何在 Python 中解析 C 格式的字符串？

Question

我的 C 文件中有这段代码：

printf("Worker name is %s and id is %d", worker.name, worker.id);

我希望使用 Python 能够解析格式字符串并找到"%s"and "%d"。

所以我想要一个功能：

>>> my_function("Worker name is %s and id is %d")
[Out1]: ((15, "%s"), (28, "%d))

我尝试使用 libclang 的 Python 绑定和 pycparser 来实现这一点，但我没有看到如何使用这些工具来完成。

我也尝试过使用正则表达式来解决这个问题，但这一点都不简单 - 考虑一下printfhas"%%s"和类似这样的东西的用例。

gcc 和 clang 显然都是在编译过程中这样做的——没有人将此逻辑导出到 Python 吗？

score 9 · Accepted Answer

您当然可以使用正则表达式找到格式正确的候选人。

看一下C 格式规范的定义。（使用微软，但使用你想要的。）

这是：

%[flags] [width] [.precision] [{h | l | ll | w | I | I32 | I64}] type

您还可以在 printf 中找到特殊%%情况%。

您可以将该模式转换为正则表达式：

(                                 # start of capture group 1
%                                 # literal "%"
(?:                               # first option
(?:[-+0 #]{0,5})                  # optional flags
(?:\d+|\*)?                       # width
(?:\.(?:\d+|\*))?                 # precision
(?:h|l|ll|w|I|I32|I64)?           # size
[cCdiouxXeEfgGaAnpsSZ]            # type
) |                               # OR
%%)                               # literal "%%"

演示

然后进入 Python 正则表达式：

import re

lines='''\
Worker name is %s and id is %d
That is %i%%
%c
Decimal: %d  Justified: %.6d
%10c%5hc%5C%5lc
The temp is %.*f
%ss%lii
%*.*s | %.3d | %lC | %s%%%02d'''

cfmt='''\
(                                  # start of capture group 1
%                                  # literal "%"
(?:                                # first option
(?:[-+0 #]{0,5})                   # optional flags
(?:\d+|\*)?                        # width
(?:\.(?:\d+|\*))?                  # precision
(?:h|l|ll|w|I|I32|I64)?            # size
[cCdiouxXeEfgGaAnpsSZ]             # type
) |                                # OR
%%)                                # literal "%%"
'''

for line in lines.splitlines():
    print '"{}"\n\t{}\n'.format(line, 
           tuple((m.start(1), m.group(1)) for m in re.finditer(cfmt, line, flags=re.X)))

印刷：

"Worker name is %s and id is %d"
    ((15, '%s'), (28, '%d'))

"That is %i%%"
    ((8, '%i'), (10, '%%'))

"%c"
    ((0, '%c'),)

"Decimal: %d  Justified: %.6d"
    ((9, '%d'), (24, '%.6d'))

"%10c%5hc%5C%5lc"
    ((0, '%10c'), (4, '%5hc'), (8, '%5C'), (11, '%5lc'))

"The temp is %.*f"
    ((12, '%.*f'),)

"%ss%lii"
    ((0, '%s'), (3, '%li'))

"%*.*s | %.3d | %lC | %s%%%02d"
    ((0, '%*.*s'), (8, '%.3d'), (15, '%lC'), (21, '%s'), (23, '%%'), (25, '%02d'))

score 1 · Accepted Answer

一个简单的实现可能是以下生成器：

def find_format_specifiers(s):
    last_percent = False
    for i in range(len(s)):
        if s[i] == "%" and not last_percent:
            if s[i+1] != "%":
                yield (i, s[i:i+2])
            last_percent = True
        else:
            last_percent = False

>>> list(find_format_specifiers("Worker name is %s and id is %d but %%q"))
[(15, '%s'), (28, '%d')]

如果需要，这可以很容易地扩展以处理额外的格式说明符信息，如宽度和精度。

score 0 · Accepted Answer

这是我编写的迭代代码，用于打印 %s %d 或任何此类格式字符串的索引

            import re  
            def myfunc(str):
                match = re.search('\(.*?\)',str)
                if match:
                    new_str = match.group()
                    new_str = new_str.translate(None,''.join(['(',')','"'])) #replace the characters in list with none
                    print new_str
                    parse(new_str)
                else:
                    print "No match"

            def parse(str):
                try:
                    g = str.index('%')
                    print " %",str[g+1]," = ",g
                    #replace % with ' '
                    list1 = list(str)
                    list1[str.index('%')] = ' '
                    str = ''.join(list1)

                    parse(str)
                except ValueError,e:
                    return

            str = raw_input()
            myfunc(str)`

希望能帮助到你

python - 如何在 Python 中解析 C 格式的字符串？

3 回答 3

Related

Reference