10

I have a simple test program (error checks removed):

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>

int main() {
    std::string line;
    while(std::cin >> line) {
        int value;
        std::stringstream stream(line);

        stream >> std::setbase(0) >> value;

        std::cout << "You typed: " << value << std::endl;
    }

}

Which works great for prefix-dependent integer parsing. It'll parse strings starting with "0x" or "0X" as hexadecimal and strings starting with '0' as octal. This is explained in several resources that I use and have seen. What I haven't been able to find though, is an indication in the C++ standard that this is guaranteed to work.

Section 7.20.1.4.3 on strtol in the C standard says (6.4.4.1 is the syntax for integer constants) I imagine the extraction operators use this under the hood:

If the value of base is zero, the expected form of the subject sequence is that of an integer constant as described in 6.4.4.1, optionally preceded by a plus or minus sign, but not including an integer suffix.

This works on the couple of versions of GCC that I've tried, but is it safe to use generally?

4

3 回答 3

6

setbase is defined in C++98 [lib.std.manip]/5, paraphrasing slightly

smanip setbase(int base);

Returns: An object s of unspecified type such that [inserting or extracting s from a stream behaves as if the following function were called on that stream:]

ios_base& f(ios_base& str, int base)
{
    str.setf(n == 8 ? ios_base::oct :
             n == 10 ? ios_base::dec :
             n == 16 ? ios_base::hex :
             ios_base::fmtflags(0), ios_base::basefield);
    return str;
}

Okay, so, if base is not 8, 10, or 16, then the basefield flags are cleared. The effect of a cleared basefield for input is defined in [lib.facet.num.get.virtuals], table 55 ("Integer conversions") as equivalent to sscanf("%i") on the sequence of characters next available.

C++98 refers to C89 for the definition of *scanf, naturally enough. I don't have a PDF copy of C89, but I do have C99, in which section 7.19.6.2 paragraph 12 [the C standard does not have the nice symbolic section names that the C++ standard has] defines "%i" to behave the same as strtol with base argument 0.

So the good news is, prefix-dependent integer scanning is guaranteed by the standard after setbase(0). The bad news is, iostream formatted input is defined in terms of *scanf, which means the dreadful sentence at the end of C99 7.19.6.2p10 applies:

If [the object that receives the result of scanning] does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

(Emphasis mine.) Clearer version of that sentence: input overflow triggers undefined behavior. The C(++) runtime is allowed to crash the program if input to *scanf has too many digits! This is (one of several reasons) why I and others keep saying *scanf should never be used, and now I have to start saying it about istream >> int as well. :-(

The advice that holds for C is even easier to apply in C++: Read entire lines with std::getline and parse them by hand. Use the strtol family of functions to convert numeric input to machine numbers. (Those functions have predictable behavior on overflow.)

于 2012-11-02T14:27:26.453 回答
3

§22.4.2.1.2/3, Table 85:

For conversion to an integral type, the function determines the integral conversion specifier as indicated in Table 85. The table is ordered. That is, the first line whose condition is true applies.

Table 85 — Integer conversions
State                    stdio equivalent
basefield == oct         %o
basefield == hex         %X
basefield == 0           %i
signed integral type     %d
unsigned integral type   %u

The %i conversion format for scanf and company does prefix-dependent conversion.

于 2012-11-02T14:24:47.467 回答
2

Let's start with §27.6.3, "Standard manipulators", ¶5, "smanip setbase(int base)":

Returns: An object s of unspecified type such that if in is an (instance of) basic_istream then the expression in>>s behaves as if f(s) were called. Where f can be defined as:

ios_base& f(ios_base& str, int base)
{
  // set basefield
  str.setf(base == 8 ? ios_base::oct :
    base == 10 ? ios_base::dec :
    base == 16 ? ios_base::hex :
    ios_base::fmtflags(0), ios_base::basefield);
  return str;
}

We continue our quest with §27.4.2.2 ios_base fmtflags state functions, ¶6 fmtflags setf(fmtflags fmtfl, fmtflags mask);

Effects: Clears mask in flags(), sets fmtfl & mask in flags().

So, what is the effect of setting 0&basefield in flags()?

Consider §27.6.1.2.2 Arithmetic Extractors, which describes, among others, operator>>(int& val);:

these extractors depend on the locale’s num_get<> (22.2.2.1) object to perform parsing the input stream data.

§22.2.2.1, ¶4, Table 55 describes the conversion specifier selected in this case:

basefield == 0, `%i`

Finally, ¶11 says:

A sequence of chars ... is converted (according to the rules of scanf) to a value of the type of val.


So, the C++ Standard, 2003, says that std::cin >> setbase(0) >> i is equivalent to scanf(..., "%i", &i).

For what that means, you need to consult the C Standard.

于 2012-11-02T14:30:01.900 回答