I would like to offer a completely different approach:
Take your input string, tokenise it yourself, and then convert the individual fields using boost::lexical_cast<T>
.
Reason: I wasted an afternoon on parsing a string containing 2 int and 2 double fields, separated by spaces. Doing the following:
int i, j;
double x, y;
std::istringstream ins{str};
ins >> i >> j >> x >> y;
// how to check errors???...
parses the correct input such as
`"5 3 9.9e+01 5.5e+02"`
correctly, but does not detect the problem with this:
`"5 9.6e+01 5.5e+02"`
What happens is that i
will be set to 5 (OK), j
will be set to 9 (??), x
to 6.0 (=0.6e+01), y
to 550 (OK). I was quite surprised to see failbit
not being set... (platform info: OS X 10.9, Apple Clang++ 6.0, C++11 mode).
Of course you can say now, "But wait, the Standard states that it should be so", and you may be right, but knowing that it is a feature rather than a bug does not reduce the pain if you want to do proper error checking without writing miles of code.
OTOH, if you use "Marius"'s excellent tokeniser function and split str
first on whitespace then suddenly everything becomes very easy. Here is a slightly modified version of the tokeniser. I re-wrote it to return a vector of strings; the original is a template that puts the tokens in a container with elements convertible to strings. (For those who need such a generic approach please consult the original link above.)
// \param str: the input string to be tokenized
// \param delimiters: string of delimiter characters
// \param trimEmpty: if true then empty tokens will be trimmed
// \return a vector of strings containing the tokens
std::vector<std::string> tokenizer(
const std::string& str,
const std::string& delimiters = " ",
const bool trimEmpty = false
) {
std::vector<std::string> tokens;
std::string::size_type pos, lastPos = 0;
const char* strdata = str.data();
while(true) {
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos) {
// no more delimiters
pos = str.length();
if(pos != lastPos || !trimEmpty) {
tokens.emplace_back(strdata + lastPos, pos - lastPos);
}
break;
} else {
if(pos != lastPos || !trimEmpty) {
tokens.emplace_back(strdata + lastPos, pos - lastPos);
}
}
lastPos = pos + 1;
}
return tokens;
}
and then just use it like this (ParseError
is some exception object):
std::vector<std::string> tokens = tokenizer(str, " \t", true);
if (tokens.size() < 4)
throw ParseError{"Too few fields in " + str};
try {
unsigned int i{ boost::lexical_cast<unsigned int>(tokens[0]) },
j{ boost::lexical_cast<unsigned int>(tokens[1]) };
double x{ boost::lexical_cast<double>(tokens[2]) },
y{ boost::lexical_cast<double>(tokens[3]) };
// print or process i, j, x, y ...
} catch(const boost::bad_lexical_cast& error) {
throw ParseError{"Could not parse " + str};
}
Note: you can use the Boost split or the tokenizer if you wish, but they were slower than Marius' tokeniser (at least in my environment).
Update: Instead of boost::lexical_cast<T>
you can use the C++11 "std::sto*
" functions (e.g. stoi
to convert a string token to an int). These throw two kinds of exceptions: std::invalid_argument
if the conversion could not be performed and std::out_of_range
if the converted value cannot be represented.
You could either catch these separately or their parent std::runtime_error
. Modifications to the example code above is left as an exercise to the reader :-)