3

所以我试图从一个文件中读取所有单词,并在我这样做时去掉标点符号。这是剥离标点符号的逻辑:

编辑:该程序实际上完全停止运行,只是想弄清楚这一点

ifstream file("text.txt");

string              str;
string::iterator    cur;

for(file>>str; !file.eof(); file>>str){
    for(cur = str.begin(); cur != str.end(); cur++){
         if (!(isalnum(*cur))){
            cur = str.erase(cur);
         }
    }
cout << str << endl;
...
}

假设我有一个文本文件,内容如下:

This is a program. It has trouble with (non alphanumeric chars)

But it's my own and I love it...

当我coutendl;我的字符串在这个逻辑之后,我会得到

This
is
a
program
It
has
trouble
with
non
alphanumeric

这就是所有人。我的迭代器逻辑有问题吗?我该如何解决这个问题?

谢谢你。

4

3 回答 3

4

我看到的迭代器的主要逻辑问题是,对于非字母数字字符,迭代器会增加两次:在erase它移动到下一个符号期间,然后cur++for循环中增加它,因此它会跳过非字母数字符号之后的每个符号。

所以可能是这样的:

string              next;
string::iterator    cur;

cur = next.begin()
while(cur != next.end()){
    if (!(isalnum(*cur))){
        cur = next.erase(cur);
    } else {
        cur++;
    }
}

这只是删除了非字母数字字符。如果您需要标记您的输入,您将不得不执行更多操作,即记住,您是否在一个单词内(已阅读至少一个字母数字字符)并相应地采取行动。

于 2013-04-09T05:15:49.290 回答
2

构建转换列表时不复制标点符号怎么样。好的。可能矫枉过正。

#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <cctype>
using namespace std;

// takes the file being processed as only command line param
int main(int argc, char *argv[])
{
    if (argc != 2)
        return EXIT_FAILURE;

    ifstream inf(argv[1]);
    vector<string> res;
    std::transform(istream_iterator<string>(inf),
        istream_iterator<string>(),
        back_inserter(res),
        [](const string& s) {
            string tmp; copy_if(s.begin(), s.end(), back_inserter(tmp),
            [](char c) { return std::isalnum(c); });
            return tmp;
        });

    // optional dump to output
    copy(res.begin(), res.end(), ostream_iterator<string>(cout, "\n"));

    return EXIT_SUCCESS;
}

输入

All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts,
His acts being seven ages. At first, the infant,
Mewling and puking in the nurse's arms.

输出

All
the
worlds
a
stage
And
all
the
men
and
women
merely
players
They
have
their
exits
and
their
entrances
And
one
man
in
his
time
plays
many
parts
His
acts
being
seven
ages
At
first
the
infant
Mewling
and
puking
in
the
nurses
arms
于 2013-04-09T05:28:40.463 回答
1

您应该使用ispunct来测试标点符号。如果您还想过滤掉控制字符,您应该使用iscntrl.

过滤掉标点符号后,您可以拆分空格和换行符以获取单词。

于 2013-04-09T05:15:58.433 回答