function - Sed：蛇案例功能

Question

我需要一个 sed 脚本来自动将 C 函数转换为小写蛇形。

到目前为止，我所拥有的是以下内容，它将用下划线分隔驼峰式单词，但它不会小写它们并且会影响所有内容。

sed -i -e 's/\([a-z0-9]\)\([A-Z]\)/\1_\L\2/g' `find source/ -type f`

如何使其仅适用于功能？即仅在字符串后跟字符“（”。

另外，我需要什么使字符串变为小写？

例如，如果我有这个代码：

void destroyPoolLender(PoolLender *lender)
{
    while (!isListEmpty(&lender->pools)) {
        MemoryPool *myPool = listPop(&this->pool);

        if (pool->inUse) {
            logError("%s memory pool still in use. Pool not released.", pool->lenderName);
        } else {
            free(pool);
        }
    }
    listDestroy(&this->pool);
}

转换后应如下所示：

void destroy_pool_lender(PoolLender *lender)
{
    while (!is_list_empty(&lender->pools)) {
        MemoryPool *myPool = list_pop(&this->pool);

        if (pool->inUse) {
            log_error("%s memory pool still in use. Pool not released.", pool->lenderName);
        } else {
            free(pool);
        }
    }
    list_destroy(&lender->pools);
}

请注意 myPool 是如何保持不变的，因为它不是函数名称。

score 1 · Accepted Answer

bash 的解决方案。它通过命令使用来自目标文件的信息nm。见man nm。

要从源文件创建目标文件，您需要为每个源文件运行gcc选项-c（可能您已经拥有它们，由make命令创建。然后，您可以跳过此步骤）：

gcc -c one.c -o one.o
gcc -c two.c -o two.o

用法： ./convert.sh one.o two.o

#!/bin/bash

# store original function names to the variable.
orig_func_names=$(
    # get list symbols from all object files
    nm -f sysv "$@" |
    # picks the functions and removes all information except names.
    sed -n '/FUNC/s/\s.*//p' |
    # selects only functions, which contain the uppercase letter in the name.
    sed -n '/[A-Z]/p'
);

# convert camel case names to snake case names and store new names to the variable.
new_func_names=$(sed 's/[A-Z]/_\l&/g' <<< "$orig_func_names")

# create file, containing substitute commands for 'sed'. 
# Example of commands from this file:
# s/\boneTwo\b/one_two/g
# s/\boneTwoThree\b/one_two_three/g
# etc. One line to the each function name.
paste -d'/' <(printf 's/\\b%s\\b\n' ${orig_func_names}) <(printf '%s/g\n' ${new_func_names}) > command_file.txt

# do converting
# change object file extenstions '.o' to C source - '.c' file extensions.
# were this filenames: one.o two.o three.o
# now they are: one.c two.c three.c
# this 'sed' command creates backup for the each file and change the source files. 
sed -i_backup -f command_file.txt "${@/.o/.c}"

应该注意，在此解决方案中，执行时间呈指数增长。例如，如果我们有 70000 行和 1000 个函数，那么它需要进行 7000 万次检查（70000 行 * 1000 个函数）。知道需要多少时间会很有趣。

测试

输入

文件one.c

#include <stdio.h>

int one(); 
int oneTwo(); 
int oneTwoThree();
int oneTwoThreeFour();

int one() {
    puts("");
    return 0;
}

int oneTwo() {
    printf("%s", "hello");
    one();
    return 0;
}

int oneTwoThree() {
    oneTwo();
    return 0;   
}

int oneTwoThreeFour() {
    oneTwoThree();
    return 0;   
}

int main() {

    return 0;
}

文件two.c

#include <stdio.h>

int two() {
    return 0; 
}

int twoThree() {
    two();
    return 0;
}   

int twoThreeFour() {
    twoThree();
    return 0;    
}

输出

文件one.c

#include <stdio.h>

int one(); 
int one_two(); 
int one_two_three(); 
int one_two_three_four(); 

int one() {
    puts("");
    return 0;   
}

int one_two() {
    printf("%s", "hello");
    one();
    return 0;   
}

int one_two_three() {
    one_two();
    return 0;   
}

int one_two_three_four() {
    one_two_three();
    return 0;   
}

int main() {

    return 0;
}

文件two.c

#include <stdio.h>

int two() {
    return 0;  
}

int two_three() {
    two();
    return 0;
}   

int two_three_four() {
    two_three();
    return 0;    
}

score 1 · Accepted Answer

我们可以用 sed 做到这一点。诀窍是匹配包括(as 捕获组 2 在内的所有内容，并使用\l而不是\L, 仅小写第一个匹配的字符：

s/\([a-z0-9]\)\([A-Z][A-Za-z0-9]*(\)/\1_\l\2/

我们不能只使用/g修饰符，因为后面的替换可能会重叠，所以循环使用：

#!/bin/sed -rf

:loop
s/([a-z0-9])([A-Z][A-Za-z0-9]*\()/\1_\l\2/
tloop

（我使用-rGNU sed 来减少我需要的反斜杠的数量）。

进一步的简化是匹配非单词边界；这消除了对两个捕获组的需要：

#!/bin/sed -rf

:loop
s/\B[A-Z]\w*\(/_\l&/
tloop

演示：

$ sed -r ':loop;s/\B[A-Z]\w*\(/_\l&/;tloop' \
          <<<'SomeType *myFoo = callMyFunction(myBar, someOtherFunction());'

SomeType *myFoo = call_my_function(myBar, some_other_function());

请注意，这只会修改函数调用和定义 - 如果您正在存储或传递函数指针，则很难识别哪些名称是函数。如果您只有 70k 行要处理，您可能会选择手动修复它们（对编译错误做出反应）。如果您正在使用 1M+，您可能需要一个适当的重构工具。

function - Sed：蛇案例功能

2 回答 2

测试

演示：

Related

Reference