c++ - 更改数据时的 UB 说明

Question

我试图向工作伙伴证明，如果真的想（并且知道如何）通过使用一些技巧来更改常量限定变量的值，在我的演示过程中，我发现存在两种“口味”常量值：无论做什么都无法更改的值，以及可以通过使用肮脏技巧来更改的值。

当编译器使用文字值而不是存储在堆栈中的值（在此处读取）时，常量值是不可更改的，这是一段代码，说明了我的意思：

// TEST 1
#define LOG(index, cv, ncv) std::cout \
    << std::dec << index << ".- Address = " \
    << std::hex << &cv << "\tValue = " << cv << '\n' \
    << std::dec << index << ".- Address = " \
    << std::hex << &ncv << "\tValue = " << ncv << '\n'

const unsigned int const_value = 0xcafe01e;

// Try with no-const reference
unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
no_const_ref = 0xfabada;
LOG(1, const_value, no_const_ref);

// Try with no-const pointer
unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
*no_const_ptr = 0xb0bada;
LOG(2, const_value, (*no_const_ptr));

// Try with c-style cast
no_const_ptr = (unsigned int *)&const_value;
*no_const_ptr = 0xdeda1;
LOG(3, const_value, (*no_const_ptr));

// Try with memcpy
unsigned int brute_force = 0xba51c;
std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
LOG(4, const_value, (*no_const_ptr));

// Try with union
union bad_idea
{
    const unsigned int *const_ptr;
    unsigned int *no_const_ptr;
} u;

u.const_ptr = &const_value;
*u.no_const_ptr = 0xbeb1da;
LOG(5, const_value, (*u.no_const_ptr));

这会产生以下输出：

1.- Address = 0xbfffbe2c    Value = cafe01e
1.- Address = 0xbfffbe2c    Value = fabada
2.- Address = 0xbfffbe2c    Value = cafe01e
2.- Address = 0xbfffbe2c    Value = b0bada
3.- Address = 0xbfffbe2c    Value = cafe01e
3.- Address = 0xbfffbe2c    Value = deda1
4.- Address = 0xbfffbe2c    Value = cafe01e
4.- Address = 0xbfffbe2c    Value = ba51c
5.- Address = 0xbfffbe2c    Value = cafe01e
5.- Address = 0xbfffbe2c    Value = beb1da

由于我依赖于UB（更改 const 数据的值），因此预计该程序的行为会很奇怪；但这种怪异超出了我的预期。

假设编译器使用的是字面量值，那么，当代码到达改变常量值的指令（通过引用、指针或memcpying）时，只要该值是字面量（未定义行为虽然）。这解释了为什么该值保持不变，但是：

为什么两个变量中的内存地址相同但包含的值不同？

AFAIK 相同的内存地址不能指向不同的值，因此，输出之一是谎言：

到底发生了什么？哪个内存地址是假的（如果有的话）？

对上面的代码进行一些更改，我们可以尽量避免使用字面值，所以这个诡计可以发挥作用（source here）：

// TEST 2
// Try with no-const reference
void change_with_no_const_ref(const unsigned int &const_value)
{
    unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
    no_const_ref = 0xfabada;
    LOG(1, const_value, no_const_ref);    
}

// Try with no-const pointer
void change_with_no_const_ptr(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
    *no_const_ptr = 0xb0bada;
    LOG(2, const_value, (*no_const_ptr));
}

// Try with c-style cast
void change_with_cstyle_cast(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = (unsigned int *)&const_value;
    *no_const_ptr = 0xdeda1;
    LOG(3, const_value, (*no_const_ptr));
}

// Try with memcpy
void change_with_memcpy(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
    unsigned int brute_force = 0xba51c;
    std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
    LOG(4, const_value, (*no_const_ptr));
}

void change_with_union(const unsigned int &const_value)
{
    // Try with union
    union bad_idea
    {
        const unsigned int *const_ptr;
        unsigned int *no_const_ptr;
    } u;

    u.const_ptr = &const_value;
    *u.no_const_ptr = 0xbeb1da;
    LOG(5, const_value, (*u.no_const_ptr));
}

int main(int argc, char **argv)
{
    unsigned int value = 0xcafe01e;
    change_with_no_const_ref(value);
    change_with_no_const_ptr(value);
    change_with_cstyle_cast(value);
    change_with_memcpy(value);
    change_with_union(value);

    return 0;
}

产生以下输出：

1.- Address = 0xbff0f5dc    Value = fabada
1.- Address = 0xbff0f5dc    Value = fabada
2.- Address = 0xbff0f5dc    Value = b0bada
2.- Address = 0xbff0f5dc    Value = b0bada
3.- Address = 0xbff0f5dc    Value = deda1
3.- Address = 0xbff0f5dc    Value = deda1
4.- Address = 0xbff0f5dc    Value = ba51c
4.- Address = 0xbff0f5dc    Value = ba51c
5.- Address = 0xbff0f5dc    Value = beb1da
5.- Address = 0xbff0f5dc    Value = beb1da

正如我们所见，const 限定的变量在每次change_with_*调用时都发生了变化，除了这个事实之外，行为与以前相同，所以我很想假设内存地址的奇怪行为在使用 const 数据时会表现出来作为文字而不是价值。

因此，为了确保这个假设，我进行了最后一次测试，将unsigned int valuein更改main为const unsigned int value：

// TEST 3
const unsigned int value = 0xcafe01e;
change_with_no_const_ref(value);
change_with_no_const_ptr(value);
change_with_cstyle_cast(value);
change_with_memcpy(value);
change_with_union(value);

令人惊讶的是，输出与TEST 2（此处的代码）相同，所以我认为数据作为变量而不是文字值传递，因为它用作参数，所以这让我想知道：

是什么让编译器决定将 const 值优化为文字值？

简而言之，我的问题是：

在TEST 1.
- 为什么 const 值和 no-const 值共享相同的内存地址，但包含的值不同？
- 程序遵循什么步骤来产生这个输出？哪个内存地址是假的（如果有的话）？
在TEST 3
- 是什么让编译器决定将 const 值优化为文字值？

score 2 · Accepted Answer

一般来说，分析未定义行为是没有意义的，因为不能保证您可以将分析结果转移到不同的程序。

在这种情况下，可以通过假设编译器应用了称为常量传播的优化技术来解释该行为。在该技术中，如果您使用const编译器知道该值的变量的值，那么编译器将使用该const变量的值替换该变量的使用（正如在编译时已知的那样）。变量的其他用途，例如获取其地址，不会被替换。

这种优化是有效的，正是因为更改定义为const导致未定义行为的变量，并且允许编译器假设程序不会调用未定义行为。

因此，在中TEST 1，地址是相同的，因为它都是相同的变量，但是值不同，因为每对中的第一个反映了编译器（正确地）假定为变量的值，而第二个反映了实际的值存储在那里。在TEST 2andTEST 3中，编译器无法进行优化，因为编译器不能 100% 确定函数参数将引用一个常量值（而在中TEST 2，它不会）。

c++ - 更改数据时的 UB 说明

1 回答 1

Related

Reference