c++ - 提取两个词之间的域

Question

我在日志文件中有一些这样的行：

11-test.domain1.com 记录...

37-user1.users.domain2.org 记录...

48-me.server.domain3.net 记录...

如何在没有子域的情况下提取每个域？介于 "-" 和 "Logged" 之间的东西。

我在 c++ (linux) 中有以下代码，但它不能很好地提取。如果您当然有一些示例，则返回提取字符串的某些函数会很棒。

       regex_t    preg;
       regmatch_t mtch[1];
       size_t     rm, nmatch;
       char tempstr[1024] = "";
       int start;
       rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
       nmatch = 1;
       while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
               {
                 strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
                 printf("%s\n", tempstr);
                 start +=mtch[0].rm_eo;
                 memset(host, '\0', strlen(host));
               }
       regfree(&preg);

谢谢！

PS 不，我不能为此使用 perl，因为这部分位于由其他人制作的更大的 c 程序中。

编辑：

我用这个替换代码：

   const char *p1 = strstr(buffer, "-")+1;
   const char *p2 = strstr(p1, " Logged");
   size_t len = p2-p1;
   char *res = (char*)malloc(sizeof(char)*(len+1));
   strncpy(res, p1, len);
   res[len] = '\0';

这可以很好地提取整个域，包括子域。如何从 abc.def.domain.com 中仅提取 domain.com 或 domain.net ？

strtok 是一个不错的选择，我如何计算最后一个点？

score 1 · Accepted Answer

#include <vector>
#include <string>
#include <boost/regex.hpp>

int main()
{
    boost::regex re(".+-(?<domain>.+)\\s*Logged");
    std::string examples[] = 
    {
        "11-test.domain1.com Logged ...",
        "37-user1.users.domain2.org Logged ..."
    };
    std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
    std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
    {
        boost::smatch match;
        if (boost::regex_search(s, match, re))
        {
            std::cout << match["domain"] << std::endl;
        }
    });
}

http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5 something like this with boost::regex. Don't know about pcre.

score 0 · Accepted Answer

Is the in a standard format? it appears so, is there a split function?

Edit: Here is some logic. Iterate through each domain to be parsed Find a function to locate the index of the first string "-" Next find the index of the second string minus the first string "Logged" Now you have the full domain.

Once you have the full domain "Split" the domain into your object of choice (I used an array) now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.

NOTE Written in C#

Main method which defines the first value and the second value

`static void Main(string[] args)
        {
            string firstValue ="-";
            string secondValue = "Logged";
            List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"};
            foreach (string dns in domains)
            {
                Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue));
            }
        }
`

Method to parse the string:

`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind)
        {
            string domain = string.Empty;

            if(string.IsNullOrEmpty(str))
            {
                //throw an exception, return gracefully, whatever you determine
            }
            else
            {
                //This can all be done in one line, but I broke it apart so it can be better understood.
                //returns the first occurrance.
                //int start = str.IndexOf(firstStringToFind) + 1;

                //int end = str.IndexOf(secondStringToFind);

                //domain = str.Substring(start, end - start);

                //i.e. Definitely not quite as legible, but doesn't create object unnecessarily
                domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1));

                string[] dArray = domain.Split('.');

                if (dArray.Length > 0)
                {
                    if (dArray.Length > 2)
                    {
                        domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]);
                    }
                }
            }

            return domain;
        }
`

c++ - 提取两个词之间的域

2 回答 2

Related

Reference