c++ - removing duplicate strings from an array of structs and counting them

Question

I'm trying to go through a list of structs which consist of a string and an int. The strings are just lines that consist urls, and there are duplicates of some of the urls. They are in alphabetical order so any and all duplicates are right next to each other. The int is a counter used to count how many copies of a certain url there is. What I need to do is print out only a single instance of each url, along with a count of how many instances of that url were originally in the array. The thing I'm trying to figure out is how to remove all but one instance of each url I was wondering if someone might know a technique to do this.

Here is the code I have up to this point for this particular part of the program:

 void histogram(const int MaxPages, istream& input, ostream& output)
{


    string temp;
    int current = 0;
    CountedLocation *dynamicArray = new CountedLocation[MaxPages];
    int toBeMoved = current - 1;

    getline(input, temp);

    while(!input.eof())
    {

        temp = extractTheRequest(temp);
        toBeMoved = current-1;
        dynamicArray[current].locator = temp;
        if(isAGet(temp))
        {

            temp = extractLocator(temp);
            while (toBeMoved >= 0 && temp < dynamicArray[toBeMoved].locator)
            {
                dynamicArray[toBeMoved+1].locator = dynamicArray[toBeMoved].locator;
                dynamicArray[toBeMoved+1].counter = 1;
                --toBeMoved;
            }
            dynamicArray[toBeMoved+1].locator = temp;
            dynamicArray[toBeMoved+1].counter = 1;
        }

        current++;
        getline(input, temp);

    }
    for(int i=0; i < MaxPages; i++)
    {
        string temp = dynamicArray[i].locator;
        temp = "\"" + temp + "\"";

        dynamicArray[i].locator = temp;
    }
    //int tempMax = MaxPages;
    for(int i=0; i < current; i++)
    {
        if(search(dynamicArray, MaxPages, dynamicArray[i].locator) == search(dynamicArray, MaxPages, dynamicArray[i+1].locator))
        {
            int toBeMoved = i;
            dynamicArray[i+1].counter = dynamicArray[i].counter + 1;
            while (toBeMoved < current-1)
            {
                dynamicArray[toBeMoved] = dynamicArray[toBeMoved+1];
                ++toBeMoved;
            }
            --current;
            if(search(dynamicArray, MaxPages, dynamicArray[i].locator) == search(dynamicArray, MaxPages, dynamicArray[i+1].locator))
                continue;

       }
    }

    for(int i=0; i < current+1; i++)
    {
        cerr << dynamicArray[i].locator<< ", " << dynamicArray[i].counter << endl;
        output << dynamicArray[i].locator<< ", " << dynamicArray[i].counter << endl;
    }
  delete [] dynamicArray;

}

score 2 · Accepted Answer

创建一个新vector的结构。从流的开头开始。遍历流，如果当前字符串与中最后一个元素中的字符串不同，则将vector初始化为该字符串的元素推到后面vector，将计数器设置为 1。vector否则，只需增加与最后一个元素关联的计数器。移动到流中的下一个字符串。假设输入字符串确实由已经排序的字符串组成，那么最后，vector包含具有出现次数的唯一字符串。

在伪未经测试的代码中：

std::vector<MyStruct> love_to_count (istream &input) {
    std::string url;
    std::vector<MyStruct> v;
    if (! (input >> url)) return v;
    v.push_back(MyStruct(url, 1));
    while (input >> url) {
        if (url != v.back().url) {
            v.push_back(MyStruct(url, 1));
        } else {
            v.back().count += 1;
        }
    }
    return v;
}

score 1 · Accepted Answer

除非您迫切需要绝对的最大速度，否则我会使用std::map.

std::map<std::string, int> URLs;

读入 URL 和计数。使用 URL 作为索引，并添加计数：

URLs[URL] += count;

当您阅读完所有内容后，您可以写出结果：

for (auto const &u : URLs)
    std::cout << u.first << "\t" << u.second << "\n";

虽然可以使用 a 来执行此操作vector，但工作量更大，并且如果您正在从文件中读取数据，则速度差异可能可以忽略不计（与 I/O 时间相比，处理所花费的时间将是很小的噪音)

c++ - removing duplicate strings from an array of structs and counting them

2 回答 2

Related

Reference