c - 从C中的不规则字符串中获取所有整数

Question

我正在寻找一种（相对）简单的方法来解析随机字符串并从中提取所有整数并将它们放入数组中 - 这与其他一些类似的问题不同，因为我的字符串没有标准格式。

例子：

pt112parah salin10n m5:isstupid::42$%&%^*%7first3

我最终需要得到一个包含这些内容的数组：

112 10 5 42 7 3

而且我想要一种更有效的方法，然后通过字符串逐个字符地进行。

谢谢你的帮助

score 2 · Accepted Answer

一个快速的解决方案。我假设没有超过范围的数字long，并且没有负号需要担心。如果这些是问题，那么您需要做更多的工作来分析结果，strtol()并且您需要检测'-'后跟一个数字。

代码确实会遍历所有字符；我认为你无法避免这种情况。但它确实用于strtol()处理每个数字序列（一旦找到第一个数字），并strtol()从中断的地方继续（并且strtol()足以告诉我们它停止转换的确切位置）。

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main(void)
{
    const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
    long results[100];
    int  nresult = 0;

    const char *s = data;
    char c;

    while ((c = *s++) != '\0')
    {
        if (isdigit(c))
        {
            char *end;
            results[nresult++] = strtol(s-1, &end, 10);
            s = end;
        }
    }

    for (int i = 0; i < nresult; i++)
        printf("%d: %ld\n", i, results[i]);
    return 0;
}

输出：

score 1 · Accepted Answer

只是因为我整天都在写 Python，我想休息一下。声明一个数组会很棘手。要么你必须运行两次才能计算出你有多少个数字（然后分配数组），或者像这个例子一样一个一个地使用数字。

注意 '0' 到 '9' 的 ASCII 字符是 48 到 57（即连续）。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>

int main(int argc, char **argv)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";

    int length = strlen(input);
    int value = 0;
    int i;
    bool gotnumber = false;
    for (i = 0; i < length; i++)
    {
        if (input[i] >= '0' && input[i] <= '9')
        {
            gotnumber = true;
            value = value * 10; // shift up a column
            value += input[i] - '0'; // casting the char to an int
        }
        else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers
        {
            printf("Value: %d \n", value);
            value = 0;
            gotnumber = false;
        }
    }

    return 0;
}

编辑：以前的版本没有处理 0

score 1 · Accepted Answer

比逐个字符更有效？

不可能，因为您必须查看每个字符才能知道它不是整数。

现在，鉴于您必须逐个字符地遍历字符串，我建议您将每个字符简单地转换为 int 并检查：

//string tmp = ""; declared outside of loop.
//pseudocode for inner loop:
int intVal = (int)c;
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int.
    tmp += c;
}
else if(tmp.length > 0){
    array[?] = (int)tmp; // ? is where to add the int to the array.
    tmp = "";
}

数组将包含您的解决方案。

score 0 · Accepted Answer

另一种解决方案是使用该strtok功能

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*");
  }
  return 0;
}

给出：

也许不是此任务的最佳解决方案，因为您需要指定将被视为标记的所有字符。但它是其他解决方案的替代方案。

score 0 · Accepted Answer

#include <stdio.h>
#include <string.h>
#include <math.h>

int main(void)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
    char *pos = input;
    int integers[strlen(input) / 2];   // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1
    unsigned int numInts= 0;

    while ((pos = strpbrk(pos, "0123456789")) != NULL) // strpbrk() prototype in string.h
    {
        sscanf(pos, "%u", &(integers[numInts]));

        if (integers[numInts] == 0)
            pos++;
        else
            pos += (int) log10(integers[numInts]) + 1;        // requires math.h

        numInts++;
    }

    for (int i = 0; i < numInts; i++)
        printf("%d ", integers[i]);

    return 0;
}

找到整数是通过重复调用strpbrk()偏移指针来完成的，指针再次偏移等于整数中位数的数量，通过找到整数的以 10 为底的对数并加 1 来计算（使用特殊的整数为 0 时的情况）。计算对数时无需使用abs()整数，正如您所说的整数将是非负数。如果你想更节省空间，你可以使用unsigned char integers[]而不是int integers[]，正如你所说的整数都小于 256，但这不是必需的。

score 0 · Accepted Answer

如果您不介意使用 C++ 而不是 C（通常没有充分的理由不这样做），那么您可以将解决方案缩减为仅两行代码（使用 AX 解析器生成器）：

vector<int> numbers;
auto number_rule = *(*(axe::r_any() - axe::r_num()) 
   & *axe::r_num() >> axe::e_push_back(numbers));

现在测试它：

std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

果然，你拿回了你的号码。

作为奖励，您在解析 unicode 宽字符串时无需更改任何内容：

std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

果然，你得到了相同的数字。

c - 从C中的不规则字符串中获取所有整数

6 回答 6

Related

Reference