c - 在 c 中返回可变长度字符串的最佳实践

Question

我有一个字符串函数，它接受指向源字符串的指针并返回指向目标字符串的指针。此功能目前有效，但我担心我没有遵循重新分级 malloc、realloc 和 free 的最佳做法。

我的函数的不同之处在于目标字符串的长度与源字符串的长度不同，因此必须在我的函数内部调用 realloc()。我从看文档中知道...

http://www.cplusplus.com/reference/cstdlib/realloc/

内存地址可能会在重新分配后发生变化。这意味着我不能像 C 程序员那样“通过引用传递”其他函数，我必须返回新指针。

所以我的函数原型是：

//decode a uri encoded string
char *net_uri_to_text(char *);

我不喜欢我这样做的方式，因为我必须在运行函数后释放指针：

char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);

这意味着 malloc() 和 realloc() 在我的函数内部调用，而 free() 在我的函数外部调用。

我有高级语言（perl、plpgsql、bash）的背景，所以我的直觉是适当封装这些东西，但这可能不是 C 语言的最佳实践。

问题：我的方式是最佳实践，还是我应该遵循更好的方式？

完整的例子

在未使用的 argc 和 argv 参数上编译并运行两个警告，您可以放心地忽略这两个警告。

例子.c：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *net_uri_to_text(char *);

int main(int argc, char ** argv) {
  char * chr_input = "testing123%5a%5b%5cabc";
  char * chr_output = net_uri_to_text(chr_input);
  printf("%s\n", chr_output);
  free(chr_output);
  return 0;
}

//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
  //define variables
  int int_length = strlen(chr_input);
  int int_new_length = int_length;
  char * chr_output = malloc(int_length);
  char * chr_output_working = chr_output;
  char * chr_input_working = chr_input;
  int int_output_working = 0;
  unsigned int uint_hex_working;
  //while not a null byte
  while(*chr_input_working != '\0') {
    //if %
    if (*chr_input_working == *"%") {
      //then put correct char in
      sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
      *chr_output_working = (char)uint_hex_working;
      //printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
      //realloc
      chr_input_working++;
      chr_input_working++;
      int_new_length -= 2;
      chr_output = realloc(chr_output, int_new_length);
      //output working must be the new pointer plys how many chars we've done
      chr_output_working = chr_output + int_output_working;
    } else {
      //put char in
      *chr_output_working = *chr_input_working;
    }
    //increment pointers and number of chars in output working
    chr_input_working++;
    chr_output_working++;
    int_output_working++;
  }
  //last null byte
  *chr_output_working = '\0';
  return chr_output;
}

score 8 · Accepted Answer

从 C 中的函数返回malloc'd 缓冲区是完全可以的，只要您记录它们这样做的事实。许多库都这样做，即使标准库中没有函数这样做。

如果您可以廉价地计算（一个不太悲观的上限）需要写入缓冲区的字符数，您可以提供一个执行此操作的函数并让用户调用它。

接受要填充的缓冲区也是可能的，但不太方便。我见过很多这样的库：

/*
 * Decodes uri-encoded string encoded into buf of length len (including NUL).
 * Returns the number of characters written. If that number is less than len,
 * nothing is written and you should try again with a larger buffer.
 */
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
    size_t space_needed = 0;

    while (decoding_needs_to_be_done()) {
        // decode characters, but only write them to buf
        // if it wouldn't overflow;
        // increment space_needed regardless
    }
    return space_needed;
}

现在调用者负责分配，并且会做类似的事情

size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);

len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
    // try again
    result = xrealloc(input, result, len);
}

（在这里，是我为跳过 NULL 检查而编写的“安全”分配函数。xmalloc）xrealloc

score 2 · Accepted Answer

从函数返回新的malloc（可能是内部realloc的）值是完全可以的，您只需要记录您正在这样做（就像您在此处所做的那样）。

其他明显项目：

而不是int int_length您可能想要使用size_t. 这是“无符号类型”（通常是unsigned int或unsigned long），它是字符串长度和的参数的适当类型malloc。
您最初需要分配 n+1 个字节，其中 n 是字符串的长度，因为strlen不包括终止的 0 字节。
您应该检查malloc失败（返回NULL）。如果您的函数将传递失败，请在函数描述注释中记录。
sscanf对于转换两个十六进制字节来说非常重要。没有错，只是您没有检查转换是否成功（如果输入格式错误怎么办？您当然可以确定这是调用者的问题，但通常您可能想要处理它）。您可以使用isxdigitfrom<ctype.h>检查十六进制数字和/或strtoul进行转换。
如果需要，您可能希望进行最终的“收缩重新分配”，而不是realloc每次转换都进行一次。%请注意，如果您为一个字符串分配（比如说）50 个字节并发现它只需要 49 个字节，包括最后的 0 个字节，那么它可能不值得这样做realloc。

score 2 · Accepted Answer

问题是 C 语言足够低级，足以迫使程序员正确地进行内存管理。malloc()特别是，返回一个ated 字符串并没有错。返回分配错误的对象并让调用者调用free()它们是一种常见的习惯用法。

无论如何，如果您不喜欢这种方法，您可以随时获取指向字符串的指针并从函数内部对其进行修改（但在最后一次使用之后，它仍然需要为free()d）。

但是，我认为没有必要的一件事是显式缩小字符串。如果新字符串比旧字符串短，那么在旧字符串的内存块中显然有足够的空间放置它，所以你不需要realloc().

（除了您忘记为终止 NUL 字符分配一个额外字节的事实之外，当然......）

而且，与往常一样，每次调用函数时，您都可以返回一个不同的指针，甚至根本不需要调用realloc()。

如果您接受最后一条好建议：建议const-qualify 您的输入字符串，以便调用者可以确保您不会修改它们。例如，使用这种方法，您可以安全地在字符串文字上调用该函数。

总而言之，我会像这样重写你的函数：

char *unescape(const char *s)
{
    size_t l = strlen(s);
    char *p = malloc(l + 1), *r = p;

    while (*s) {
        if (*s == '%') {
            char buf[3] = { s[1], s[2], 0 };
            *p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
            s += 3;
        } else {
            *p++ = *s++;
        }
    }

    *p = 0;
    return r;
}

并调用如下：

int main()
{
    const char *in = "testing123%5a%5b%5cabc";
    char *out = unescape(in);
    printf("%s\n", out);
    free(out);

    return 0;
}

score 0 · Accepted Answer

我会以稍微不同的方式解决这个问题。就个人而言，我会将您的功能一分为二。第一个函数计算你需要malloc的大小。第二个将输出字符串写入给定指针（已在函数外部分配）。这节省了对 realloc 的多次调用，并且将保持相同的复杂性。查找新字符串大小的可能函数是：

int getNewSize (char *string) {
    char *i = string;
    int size = 0, percent = 0;
    for (i, size; *i != '\0'; i++, size++) {
        if (*i == '%')
            percent++;
    }
    return size - percent * 2;
}

但是，正如其他答案中提到的，只要您记录它，返回一个 malloc 的缓冲区就没有问题！

score 0 · Accepted Answer

此外，在其他帖子中已经提到的内容，您还应该记录字符串被重新分配的事实。如果您的代码使用静态字符串或分配的字符串调用alloca，您可能不会重新分配它。

score 0 · Accepted Answer

我认为您担心拆分 malloc 和 frees 是正确的。作为一项规则，无论制造它，拥有它并应该释放它。

在这种情况下，字符串相对较小，一个好的过程是使字符串缓冲区大于它可能包含的任何可能的字符串。例如，URL 的实际限制约为 2000 个字符，因此如果您 malloc 10000 个字符，则可以存储任何可能的 URL。

另一个技巧是将字符串的长度和容量都存储在它的前面，因此字符串的(int)*mystring == length of string和(int)*(mystring + 4) == capacity。因此，字符串本身仅从第 8 个位置开始*(mystring+8)。通过这样做，您可以传递一个指向字符串的指针，并且始终知道它有多长以及该字符串有多少内存容量。您可以制作自动生成这些偏移量并制作“漂亮代码”的宏。

以这种方式使用缓冲区的价值是您不需要进行重新分配。新值会覆盖旧值，并更新字符串开头的长度。

c - 在 c 中返回可变长度字符串的最佳实践

完整的例子

例子.c：

6 回答 6

Related

Reference