0

我需要按 IP 对 Web 日志文件进行排序,所以我需要在下一个连接相同的 IP。我很懒,但我想学习 C++ 中的方法,所以我不想在 excel 中对其进行排序。我在日志中做了一些更改,例如在每行中的 IP 为 (8 q [symbols] { qqqqqqqq }) 之后,再转到另一个地址 - 所以我可以按每个字符串的数字对字符串进行排序,因为 IP 没有相同的长度 - 所以我只需要将 16 个字符排成一行并进行比较 - 至少我认为这是个好主意。

日志示例:

85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,
105.216.xx.xx   qqqqqqqq    - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,

日志有超过 60 000 行,我使用 C++ 擦除了 robots.txt、.js、.gif、.jpg 等行,所以我有点想回收旧代码。“robot.txt”删除行的示例。

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;

while (getline(infile, line)) {

    myfile.open("C:\\ipout.txt");

    for (string line; getline(infile, line); ) {
        if (line.find("robots.txt") != string::npos)
                myfile << line << "\n";
    }
}

infile.close();
myfile.close();

cout << " \n";
cin.get();

return 0;
}

我知道这段代码看起来很糟糕,但它完成了它的工作,我还在学习,当然我想要旧文件和另一个文件(新文件)。

我找到了有关此主题的帮助,但对我来说这有点离题...

我正在考虑将“if”语句更改为仅读取 16 个字符,比较它们并将它们连接(在彼此下方,到行)当然整行应该是完整的 - 如果可能的话。

4

2 回答 2

0

感谢您的帖子和代码,这很有帮助,我学到了新东西。你说得对,我对我想要的东西的描述有点奇怪,但我允许自己根据需要修改你的代码。因此,对于寻找这种网络日志更改的人,我将分享这段代码。

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
#include <unordered_map>

using namespace std;

using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
logmap rv;
std::string line;
while (std::getline(is, line)) {
    // put the line in a stringstream to extract ip and the rest
    std::stringstream ss(line);
    std::string ip;
    std::string rest;
    ss >> ip >> std::ws;
    std::getline(ss, rest);
    // add your filtering here 
    // put the entry in the map using ip as key
    rv[ip].push_back(rest);
}
return rv;
}

int main() {

ifstream infile("C:\\ips.txt");
ofstream myfile;
myfile.open("C:\\ipout.txt");
long nr = 0;

logmap lm = readlog(infile);
for (const auto& m : lm) {
    nr++;
    for (const auto& l : m.second){
        myfile << nr << " " << m.first << " " << l << "\n";
    }
}
infile.close();
myfile.close();
std::cout << "Enter ! \n";
std::cin.get();

return 0;
}

输入 (ips.txt) - 网络日志文件:

1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,tygy
9.10.11.12  qqqqqqqq    all
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,6fg
3.6.7.2     qqqqqqqq    GET" line code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,s5
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
9.10.11.12  qqqqqqqq    all

代码输出(ipout.txt):

1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,tygy
1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,s5
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,6fg
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
3 9.10.11.12 qqqqqqqq   all
3 9.10.11.12 qqqqqqqq   all
4 3.6.7.2 qqqqqqqq  GET" line code,

我的第一个代码来自 1. question,可以帮助您删除不需要的行。

所以再次感谢我的英雄>> Ted Lyngmo <<,长寿和繁荣:-)。

于 2018-11-19T16:03:20.820 回答
0

I'm not sure I really understood the log format but I guess you can adapt this to fit your needs.

This assumes a line based log format where each line starts with the key that you want to group on (the ip number for example). It uses an unordered_map, but you can try a normal map too. The key in the map is the IP number and the rest of the line will be put in a vector of strings.

#include <iostream>
#include <vector>
#include <sstream>
#include <unordered_map>

// alias for the map
using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
    logmap rv;
    std::string line;
    while(std::getline(is, line)) {
        // put the line in a stringstream to extract ip and the rest
        std::stringstream ss(line);
        std::string ip;
        std::string rest;
        ss >> ip >> std::ws;
        std::getline(ss, rest);
        // add your filtering here 
        // put the entry in the map using ip as key
        rv[ip].push_back(rest);
    }
    return rv;
}

int main() {
    logmap lm = readlog(std::cin);
    for(const auto& m : lm) {
        std::cout << m.first << "\n";
        for(const auto& l : m.second) {
            std::cout << " " << l << "\n";
        }
    }
}

Given this input:

127.0.0.1 first ip first line
192.168.0.1 first line of second ip
127.0.0.1 this is the second for the first ip
192.168.0.1 second line of second ip
127.0.0.1 and here's the third for the first
192.168.0.1 third line of second ip

This is a possible output:

192.168.0.1
 first line of second ip
 second line of second ip
 third line of second ip
127.0.0.1
 first ip first line
 this is the second for the first ip
 and here's the third for the first
于 2018-11-19T13:55:27.880 回答