3

I don't think I understand how to return only the matched regular expression. I have a file that is a webpage. I'm trying to get all the links in the page. The regex works fine. But if I printf it out it will print out the line in which that match occurs. I only want to display the match only. I see you can do grouping so I tried that and am getting back an int value for my second printf call. According to the doc it is an offset. But offset to what? It doesn't seem to be accurate either because it would say 32 when character 32 on that line has nothing to do with the regex. I put in an exit just see the first match. Where am I going wrong?

  char line[1000];
  FILE *fp_original;
  fp_original = fopen (file_original_page, "r");

  regex_t re_links;
  regmatch_t group[2];
  regcomp (&re_links, "(href|src)=[\"|'][^\"']*[\"|']", REG_EXTENDED);

  while (fgets (line, sizeof line, fp_original) != NULL) {
    if (regexec (&re_links, line, 2, group, 0) == 0) {
      printf ("%s", line);
      printf ("%u\n", line[group[1].rm_so]);
      exit (1);
    }
  }

  fclose (fp_original);
4

2 回答 2

4

regmatch_t 数组

regmatch_t 是您传递给正则表达式调用的匹配数组。如果我们通过 2 作为正则表达式中的匹配数,我们在 regmatch_t[0] 中获得整个匹配,在 regmatch_t[1] 中获得子匹配。

例如:

size_t nmatch = 2;
regmatch_t pmatch[2];

rc = regex(&re_links, line, nmatch, pmatch, 0);

如果成功,您可以获得如下子表达式:

pmatch[1].rm_eo - pmatch[1].rm_so, &line[pmatch[1].rm_so],
pmatch[1].rm_so, pmatch[1].rm_eo - 1);

以下是有关如何应用上述内容的示例:

#include <regex.h>                                                              
#include <stdio.h>                                                              
#include <stdlib.h>                                                             

int main(void)                                                                  
{                                                                                
    regex_t preg;                                                              

    char *string = "I'm a link to somewhere";                               
    char *pattern = ".*\\(link\\).*";                                     

    size_t     nmatch = 2;                                                        
    regmatch_t pmatch[2];                                                         


    regcomp(&preg, pattern, 0);                                                   
    regexec(&preg, string, nmatch, pmatch, 0);                                    

    printf("a matched substring \"%.*s\" is found at position %d to %d.\n",       
     pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],  
     pmatch[1].rm_so, pmatch[1].rm_eo - 1);                                 

    regfree(&preg);                                                               

    return 0;                                                                     
}    

上面的代码肯定是不保存的。它仅作为示例。如果您与您的小组交换 pmatch 它应该可以工作。也不要忘记用括号括起来你想要在你的组中捕获的正则表达式部分 -->\\(.*\\)

编辑

为了避免编译器对字段精度的警告,您可以将整个 printf 部分替换为:

char *result;

result = (char*)malloc(pmatch[1].rm_eo - pmatch[1].rm_so);
strncpy(result, &string[pmatch[1].rm_so], pmatch[1].rm_eo - pmatch[1].rm_so);

printf("a matched substring \"%s\" is found at position %lld to %lld.\n",
       result, pmatch[1].rm_so, pmatch[1].rm_eo - 1);

// later on ...
free(result);
于 2013-07-20T17:22:35.857 回答
2

结果匹配(你的group)给你一个开始索引和一个结束索引。您只需要打印这两个索引之间的项目。

group[0]将是整个正则表达式匹配。随后的组将是您在正则表达式中的任何捕获。

for(int i = 0; i < re_links.re_nsub; ++i) {
    printf("match %d from index %d to %d: ", i, group[i].rm_so, group[i].rm_eo);

    for(int j = group[i].rm_so; j < group[i].rm_eo; ++j) {
        printf("%c", line[j]);
    }
    printf("\n");
}

有关完整示例,请参见我的答案

于 2013-07-20T17:01:53.803 回答