c - 带有匹配括号的正则表达式

Question

我正在尝试从 C 源代码中提取特定的硬编码变量。我剩下的问题是我想解析数组初始化，例如：

#define SOMEVAR { {T_X, {1, 2}}, {T_Y, {3, 4}} }

将这个例子解析成“{T_X, {1, 2}}”和“{T_Y, {3, 4}}”就足够了，因为这样就可以递归得到完整的结构。但是，它需要足够通用，以便能够解析任何用户定义的类型。

更好的是正则表达式列表，可用于从通用 C 代码结构（如#define、枚举和全局变量）中获取额外值。

C 代码是提供给我的，所以我无法控制它。我宁愿不编写一个一次解析一个字符的函数。但是，拥有一系列正则表达式是可以的。

这不是将文件导入 MATLAB 或基本正则表达式的问题。我正在使用一个特定的正则表达式，它通过括号保留分组。

编辑：看起来正则表达式不进行递归或任意深度匹配。根据这里和这里。

score 1 · Accepted Answer

也许 vim 的语法文件会帮助解决这个问题。我不确定它是否有你想要的那些元素（我不做 C），但它有很多元素，所以它绝对是一个起点。下载 vim (www.vim.org)，然后在 vim/syntax/c.vim 中查看一下。

score 1 · Accepted Answer

我认为正则表达式不适用于任意 C 代码。 Clang允许您从 C 代码构建语法树并以编程方式使用它。

这可以很容易地用于全局变量，但#defines 由预处理器处理，所以我不确定它们将如何工作。

cristi:tmp diciu$ cat test.c
#define t 1
int m=5;


int fun(char * y)
{
    float g;

    return t;
}

int main()
{
    int g=7;
    return t;
}


cristi:tmp diciu$ ~/Downloads/checker-137/clang -ast-dump test.c
(CompoundStmt 0xc01ec0 <test.c:6:1, line:10:1>
  (DeclStmt 0xc01e70 <line:7:2>
    0xc01e30 "float g"
  (ReturnStmt 0xc01eb0 <line:9:2, line:1:11>
        (IntegerLiteral 0xc01e90 <col:11> 'int' 1)))
(CompoundStmt 0xc020a0 <test.c:13:1, line:16:1>
  (DeclStmt 0xc02060 <line:14:2>
    0xc02010 "int g =
      (IntegerLiteral 0xc02040 <col:8> 'int' 7)"
  (ReturnStmt 0xc01b50 <line:15:2, line:1:11>
    (IntegerLiteral 0xc02080 <col:11> 'int' 1)))
typedef char *__builtin_va_list;
Read top-level variable decl: 'm'

int fun(char *y)


int main()

score 1 · Accepted Answer

我假设您可以访问相关的 C 代码。如果是这样，那么定义两个宏：

#define BEGIN_MATLAB_DATA
#define END_MATLAB_DATA

在这些宏之间包装您要提取的所有数据。当 C 代码被编译时，它们扩展为空，因此它们不会在那里造成伤害。

现在您可以使用一个非常简单的正则表达式来获取数据。

score 1 · Accepted Answer

编辑：现在问题已经更新，看来我之前的回答没有抓住重点。我不知道您是否已经在 Stack Overflow 上搜索过其他与正则表达式相关的问题。如果您没有，我遇到了两个可以帮助您解决问题的指导（这似乎是一个问题，至少部分是尝试匹配和跟踪打开和关闭花括号的问题）：这个一个和这个。祝你好运！

score 1 · Accepted Answer

这个正则表达式：

(\{\s*[A-Za-z_]+)\s*,\s*\{\s*\d+\s*,\s*\d+\s*\}\s*\}

seems reasonable, but I don't know if it's enough for you. It's littered with \s* to allow arbitrary whitespace between tokens, from C's point of view that's allowable. It will match stuff that looks more or less just your examples; some kind of identifier followed by exactly two digit strings.

score 1 · Accepted Answer

The formal language that defines brace matching is not a regular language. Therefore, you cannot use a regular expression to solve your problem.

The problem is that you need some way to count the number of opening braces you have already encountered. Some regular expression engines support extended features, such as peeking, which could be used to solve your problem, but these can be tough to deal with. You might be better off writing a simple parser for this task.

c - 带有匹配括号的正则表达式

6 回答 6

Related

Reference