0

我尝试读取一个字典文件,其中每一行包含由空格分隔的单词 ID、单词和频率。问题是用于存储单词的地图原来具有相同的值。如果您能帮助我,我将不胜感激。

typedef struct{
    int id;
    int count;
    char* word;
} WORD;

//read file
std::map<int, WORD*> readWordMap(char* file_name)
{
    std::ifstream infile(file_name, std::ifstream::in);
    std::cout<<"word map read file:"<<file_name<<std::endl;
    if (! infile) {
        std::cerr<<"oops! unable to open file "<<file_name<<std::endl;
        exit(-1);
     }
     std::map<int, WORD*> map;
     std::vector<std::string> tokens;
     std::string line;
     char word[100];
     int size;
     while (std::getline(infile, line)) {
         size =  (int)split(line, tokens, ' ');
         WORD* entry = (WORD*) malloc(sizeof(WORD*));
         entry->id = atoi(tokens[0].c_str());
         entry->count = atoi(tokens[2].c_str());
         strcpy(word, tokens[1].c_str());
         entry->word = word;

         map[entry->id] = entry;
         std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;

      }
      infile.close();
      std::cout<<map.size()<<std::endl;
      std::map<int, WORD*>::const_iterator it;
      for (it = map.begin(); it != map.end(); it++) {
           std::cout<<(it->first)<<" "<<(it->second->word)<<std::endl;

      }

      return map;
}

//split string by a delimiter
size_t split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
    size_t pos = txt.find( ch );
    size_t initialPos = 0;
    strs.clear();

    while( pos != std::string::npos ) {
        strs.push_back( txt.substr( initialPos, pos - initialPos + 1 ) );
        initialPos = pos + 1;

        pos = txt.find( ch, initialPos );
    } 

   strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1      ) );

   return strs.size();
}

数据文件:

2 I  1
3 gave  1
4 him  1
5 the  3
6 book  3
7 .  3
8 He  2
9 read  1
10 loved  1

结果:

2 I  1
3 gave  1
4 him  1
5 the  3
6 book  3
7 .  3
8 He  2
9 read  1
10 loved  1
map size:9
2 loved 
3 loved 
4 loved 
5 loved 
6 loved 
7 loved 
8 loved 
9 loved 
10 loved 
4

2 回答 2

1

您忘记为WORD::word之前分配内存strcpy。并且您正在char word[100]为地图的所有项目分配地址,这对所有项目都是相同的。

 

最好使用std::string而不是 C 风格的字符串。此外,您可以使用std::stoi将字符串转换为整数。尝试这个:

struct WORD{
    int id;
    int count;
    std::string word;
};

std::map<int, WORD> readWordMap(const std::string &file_name)
{
     ...
     std::map<int, WORD> map;
     ...

     while (std::getline(infile, line)) {
         ...

         WORD entry;
         entry.id = std::stoi(tokens[0]);
         entry.count = std::stoi(tokens[2]);
         entry.word = tokens[1];

         map[entry.id] = entry;

         ...
      }
      infile.close();
      ...
}
于 2013-05-16T20:49:13.570 回答
1
WORD* entry = (WORD*) malloc(sizeof(WORD*));

分配一个WORD pointer不完整的WORD结构。

编译器不断分配条目,如果它没有初始化任何东西它们都指向一些甚至可能不属于您的程序的随机地址。)并且您重复将该指针添加到映射。所以你地图的所有第一个都指向同一个位置(巧合)。它应该是

WORD* entry = new WORD;

这是一种更清洁的方式

struct WORD{
    int id;
    int count;
    std::string word;
};

while (std::getline(infile, line)) {
     WORD* entry = new WORD;
     std::istringstream iss(line);

     iss >> entry->id >> entry->word >> entry->count;
     map[entry->id] = entry;
     std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;
  }
于 2013-05-16T20:49:31.217 回答