0

好的,所以我有一堆 C 和 C++ 代码,我需要过滤并找到函数定义。我不知道函数类型/返回值,也不知道函数定义或函数调用中的参数数量等。

到目前为止,我有:

import re, sys
from os.path import abspath
from os import walk

function = 'msg'
regexp = r"(" + function + ".*[^;]){"

found = False
for root, folders, files in walk('C:\\codepath\\'):
    for filename in files:
        with open(abspath(root + '/' + filename)) as fh:
            data = fh.read()
            result = re.findall(regexp, data)
            if len(result) > 0:
                sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result))
                sys.stdout.flush()
    break

然而,这会产生一些不需要的结果。正则表达式必须是错误的,例如这些组合:

在say的所有突变中找到“msg”定义而不是“msg()”调用:

void
shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

或者

void shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

或者

void shapex_msg (struct shaper *s) {
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}
4

1 回答 1

1

可能类似于以下正则表达式:

def make_regex(name):
    return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name))

测试你的例子:

>>> text = '''
... void
... shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s) {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }'''
>>> shapex_msg = make_regex_for_function('shapex_msg')
>>> shapex_msg.findall(text)
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {']

它仅适用于多行定义:

>>> shapex_msg.findall('''int
        shapex_msg      (
int a,
int b
)  

        {'''
['\n   \tshapex_msg   \t(\nint a,\nint b\n)  \n\n\t{']

同时,使用函数调用:

>>> shapex_msg.findall('shapex_msg(1,2,3);')
[]

请注意,您的正则表达式不起作用,因为.*它是贪婪的,因此它与正确的右括号不匹配。

于 2013-04-23T14:47:13.630 回答