python - python删除C函数体

Question

我正在寻找如何从某些 C 源文件中的函数中删除整个身体的方法。

例如，我有包含以下内容的文件：

1.  int func1 (int para) {
2.    return para;
3.  }
4.
5.  int func2 (int para) {
6.    if (1) {
7.      return para;
8.    }
9.    return para;
10. }

我试过这些正则表达式：

content = re.sub('(\{[.*]?\})', '', content, flags=re.DOTALL)

但是嵌套的 { } 有问题。此正则表达式仅替换第一个 }，因此第 9 行和第 10 行仍在内容中。我认为解决方案应该是计算 { 和 } 括号并在计数器为 0 时停止替换。{ 被发现 => counter++，} 被发现 => counter--。但我不知道如何在 python 中实现它。你们可以踢我一脚吗？

score 9 · Accepted Answer

我认为您正在尝试重新发明一个之前已经实施过多次的轮子。如果您只想提取 C 文件中每个函数的签名，那么有更简单的方法可以做到这一点。

ctags 实用程序将为您解决这个问题：

~/test$ ctags -x --c-types=f ./test.c
func1            function      1 ./test.c         int func1 (int para) {
func2            function      5 ./test.c         int func2 (int para) {
~/test$ # Clean up the output a little bit
~/test$ ctags -x --c-types=f ./test.c | sed -e 's/\s\+/ /g' | cut -d ' ' -f 5-
int func1 (int para) {
int func2 (int para) {

score 0 · Accepted Answer

这是我从 C 源文件中删除函数体的脚本之一。唯一的要求是来自 Mac OSX 中 brew 的 ctags，而不是 Mac OSX 中内置的 ctags。我不确定为什么它不适用于 Mac OSX 中的内置 ctags。您可以通过输入以下命令使用brew安装 ctags ：

$ brew install ctags

然后，将以下名为 dummyc.pl 的 PERL 脚本与 C 源文件一起使用。例如输入 C 源：

int
func1 (int para)
{
  return para;
}

int
func2 (int para)
{
  if (1)
    {
      return para;
    }
  return para;
}

这是输出：

int
func1 (int para)
{
  return 0;
}

int
func2 (int para)
{
  return 0;
}

这是 PERL 脚本：

#!/usr/bin/env perl
use strict;
use warnings;

unless ( @ARGV == 1 )
{
  print "Filter out the body of C functions.
Usage: dummyc.pl file.c
Required: ctags (e.g., \$ brew install ctags)\n";
  exit;
}

my $cfile = $ARGV[0];
my $lc = 1;
my $kindPrev = "comment";
my $lnPrev = 1;
my $lsPrev = "comment";
my $namePrev = "comment";
my $line = 1;
open(CFILE, $cfile) or die "could not open $cfile: $!";
open(PIPE, "/usr/local/Cellar/ctags/5.8/bin/ctags -xu $cfile|") or die "couldn't start pipe: $!";
while ($line)
{
  last unless $line;
  # R_USE_SIGNALS    macro        24 errors.c         #define R_USE_SIGNALS 1
  $line = <PIPE>;
  my $name;
  my $kind;
  my $ln;
  my $ls;
  if ($line)
  {
    $line =~ /^(\S+)\s+(\w+)\s+(\d+)\s+$cfile\s+(.+)/;
    $name = $1;
    $kind = $2;
    $ln = $3;
    $ls = $4;
  }
  else
  {
    $ln = 1000000;
  }

  if ($kindPrev eq "function") 
  {
    my $isFunctionBody = 0;
    my $hasStartBrace = 0;
    my $hasReturnValue = 1;
    my $noReturn = 0;
    for (my $i = $lnPrev; $i < $ln; $i++)
    {
      my $cline = <CFILE>;
      last unless $cline;

      if ($cline =~ /void.+$namePrev/)
      {
        $hasReturnValue = 0;  
      }
      if ($cline =~ /NORET.+$namePrev/)
      {
        $noReturn = 1;  
      }
      if ($isFunctionBody == 0 and $cline =~ /\{/)
      {
        $isFunctionBody = 1;
        unless ($cline =~ /^\{/)
        {
          $hasStartBrace = 1;
          print $cline;
        }
      }
      elsif ($cline =~ /^\}/)
      {
        $isFunctionBody = 0;
        print "{\n" if $hasStartBrace == 0;
        if ($noReturn == 0)
        {
          if ($hasReturnValue == 1)
          {
            print "  return 0;\n";
          }
          else
          {
            print "  return;\n";
          }
        }
      }
      unless ($isFunctionBody == 1)
      {
        print $cline;
      }
    }
  }
  else
  {
    for (my $i = $lnPrev; $i < $ln; $i++)
    {
      my $cline = <CFILE>;
      last unless $cline;
      print $cline;
    }
  }
  $kindPrev = $kind;
  $lnPrev = $ln;
  $lsPrev = $ls;
  $namePrev = $name;
}
close(PIPE) or die "couldn't close pipe: $! $?";
close(CFILE) or die "couldn't close $cfile: $! $?";

不过，您可能想要编辑 PERL 脚本。

score 0 · Accepted Answer

这是一个纯python解决方案，实现起来非常简单。

提取身体的功能

基本上，您尝试将每个{与相应的匹配}：

{如果在下一个之前有两个，}那么您正在进入一个范围。
另一方面，如果}在 next 之前有一个{，那么您将退出范围。

然后实现是微不足道的：

您在不同列表中查找{和维护的所有索引}
您还维护一个范围深度变量
- 如果当前{位置低于当前}位置，则您正在进入一个范围，您将范围深度加 1，然后移动到下一个{位置
- 如果当前{位置高于当前}位置，则您正在退出范围，您将 1 移至范围深度并移动到下一个}位置
如果作用域深度变量为 0，那么您找到了函数体的右大括号

假设您的字符串在函数体的第一个大括号之后开始（不包括大括号），使用此子字符串调用以下函数将为您提供最后一个大括号的位置：

def find_ending_brace(string_from_first_brace):
  starts = [m.start() for m in re.finditer('{', string_from_first_brace, re.MULTILINE)]
  ends = [m.start() for m in re.finditer('}', string_from_first_brace, re.MULTILINE)]

  i = 0
  j = 0
  current_scope_depth = 1

  while(current_scope_depth > 0):  
    if(ends[j] < starts[i]):
      current_scope_depth -= 1
      j += 1
    elif(ends[j] > starts[i]):
      current_scope_depth += 1
      i += 1
      if(i == len(starts)): # in case we reached the end (fewer { than })
        j += 1
        break

  return ends[j-1]

提取候选函数定义

现在，如果文件的原始字符串在变量my_content中，

find_func_begins = [m for m in re.finditer("\w+\s+(\w+)\s*\((.*?)\)\s*\{", my_content)]

会给你每个函数的原型（find_func_begins[0].group(1) == func1和find_func_begins[0].group(2) == 'int para')，和

my_content[
  find_func_begins[0].start():
    find_func_begins[0].end() +
    find_ending_brace(my_content[find_func_begins[0].end():])]

会给你正文的内容。

提取原型

我想您应该在到达第一个结束大括号后再次查找函数定义，因为正则表达式 forfind_func_begins有点松散。迭代每个函数定义并匹配大括号会产生以下迭代算法：

reg_ex = "\w+\s+(\w+)\s*\((.*?)\)\s*\{"
last = 0
protos = ""
find_func_begins = [m for m in re.finditer(reg_ex, my_content[last:], re.MULTILINE | re.DOTALL)]
while(len(find_func_begins) > 0):
  function_begin = find_func_begins[0]
  function_proto_end = last + function_begin.end()
  protos += my_content[last: function_proto_end-1].strip() + ";\n\n"

  last = function_proto_end + find_ending_brace(my_content[function_proto_end:]) + 1
  find_func_begins = [m for m in re.finditer(reg_ex, my_content[last:], re.MULTILINE | re.DOTALL)]

你应该有你想要的protos。希望这可以帮助！

score 0 · Accepted Answer

我需要清理它:)

类 FuncBody（对象）：

def __init__(self):

    self.stack = []

def stack_insert(self, sym_list):

    sym_list.sort(key=lambda x : x[1])
    #print "sym_list ", sym_list

    for sym, idx in sym_list:
        #print "here ", sym, idx
        if self.stack != [] and\
                (self.stack[-1] == '{' and sym == '}'):
            self.stack.pop()
        else:
            self.stack.append(sym)

def get_body(self, filepath, start):

    Begin = False
    self.stack = []
    for lineno in range(start, get_line_count(filepath)):
        #print lineno, getline(filepath, lineno)
        if False == Begin and\
                '{' in getline(filepath, lineno):
            self.stack_insert([('{', m.start())\
                    for m in re.finditer('{', getline(filepath, lineno))]+\
                    [('}', m.start()) for m in\
                    re.finditer('}', getline(filepath, lineno))])
            #print "Begin"
            Begin = True
            yield getline(filepath, lineno)
            continue
        if True == Begin and self.stack == []:
            #print "End here"
            break
        elif True == Begin:
             self.stack_insert([('{', m.start())\
                    for m in re.finditer('{', getline(filepath, lineno))]+\
                    [('}', m.start()) for m in\
                    re.finditer('}', getline(filepath, lineno))])

        #print "stack ", self.stack
        yield getline(filepath, lineno)

python - python删除C函数体

4 回答 4

提取身体的功能

提取候选函数定义

提取原型

我需要清理它:)

Related

Reference