4

This is another question that I can't seem to find an answer to because every example I can find uses vectors and my teacher won't let us use vectors for this class.

I need to read in a plain text version of a book one word at a time using (any number of) blank spaces
' ' and (any number of) non-letter character's as delimiters; so any spaces or punctuation in any amount needs to separate words. Here's how I did it when it was only necessary to use blank spaces as a delimiter:

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        table1.addItem(word);
    }
}

EDIT: An example of text read in, and how I need to separate it.

"If they had known;; you wished it, the entertainment.would have"

Here's how the first line would need to be separated:

If

they

had

known

you

wished

it

the

entertainment

would

have

The text will contain at the very least all standard punctuation, but also such things as ellipses ... double dashes -- etc.

As always, thanks in advance.

EDIT:

So using a second stringstream would look something like this?

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        istringstream iss2(word);

        while(iss2 >> letter)  {
            if(!isalpha(letter))
                // do something?
        }
        // do something else?
        table1.addItem(word);
    }
}
4

2 回答 2

2

我没有对此进行测试,因为我现在面前没有 g++ 编译器,但它应该可以工作(除了轻微的 C++ 语法错误)

while (getline(inFile, line))
{
    istringstream iss(line);

    while (iss >> word)
    {
        // check that word has only alpha-numeric characters
        word.erase(std::remove_if(word.begin(), word.end(), 
                                  [](char& c){return !isalnum(c);}),
                   word.end());
        if (word != "")
            table1.addItem(word);
    }
}
于 2014-11-23T23:31:38.330 回答
1

如果您可以免费使用Boost,您可以执行以下操作:

$ cat kk.txt
If they had known;; you ... wished it, the entertainment.would have

如果需要,您可以自定义 的行为,tokenizer但默认值应该足够了。

#include <iostream>
#include <fstream>
#include <string>

#include <boost/tokenizer.hpp>

int main()
{
  std::ifstream is("./kk.txt");
  std::string line;

  while (std::getline(is, line)) {
    boost::tokenizer<> tokens(line);

    for (const auto& word : tokens)
      std::cout << word << '\n';
  }

  return 0;
}

最后

$ ./a.out
If
they
had
known
you
wished
it
the
entertainment
would
have
于 2014-11-23T23:29:46.933 回答