6

优化时,GCC 似乎错误地绕过了#define测试。

首先,我使用我自己的link.ld链接器脚本在地址处提供一个__foo__符号0xFFF实际上是最低位,而不是整个地址):

INCLUDE ./default.ld
__foo__ = 0xFFF;
  • 注意:default.ld是默认的链接脚本,通过gcc ... -Wl,-verbose命令结果获取

然后,一个foo.c源文件检查__foo__'s 地址:

#include <stdint.h>
#include <stdio.h>

extern int __foo__;

#define EXPECTED_ADDR          ((intptr_t)(0xFFF))
#define FOO_ADDR               (((intptr_t)(&__foo__)) & EXPECTED_ADDR)
#define FOO_ADDR_IS_EXPECTED() (FOO_ADDR == EXPECTED_ADDR)

int main(void)
{
    printf("__foo__ at %p\n", &__foo__);
    printf("FOO_ADDR=0x%lx\n", FOO_ADDR);
    printf("EXPECTED_ADDR=0x%lx\n", EXPECTED_ADDR);
    if (FOO_ADDR_IS_EXPECTED())
    {
        printf("***Expected ***\n");
    }
    else
    {
        printf("### UNEXPECTED ###\n");
    }
    return 0;
}

我期待***Expected ***打印消息,这FOO_ADDR_IS_EXPECTED()应该是真的。

使用选项编译-O0,它按预期执行:

$ gcc -Wall -Wextra -Werror foo.c -O0 -o foo_O0 -T link.ld && ./foo_O0
__foo__ at 0x5603f4005fff
FOO_ADDR=0xfff
EXPECTED_ADDR=0xfff
***Expected ***

但有了-O1选项,它不会:

$ gcc -Wall -Wextra -Werror foo.c -O1 -o foo_O1 -T link.ld && ./foo_O1
__foo__ at 0x5580202d0fff
FOO_ADDR=0xfff
EXPECTED_ADDR=0xfff
### UNEXPECTED ###

这是反汇编-O0

$ objdump -d ./foo_O0
...
0000000000001169 <main>:
...
    11b5:       b8 00 00 00 00          mov    $0x0,%eax
    11ba:       e8 b1 fe ff ff          callq  1070 <printf@plt>
    11bf:       48 8d 05 39 fe ff ff    lea    -0x1c7(%rip),%rax        # fff <__foo__>
    11c6:       25 ff 0f 00 00          and    $0xfff,%eax
    11cb:       48 3d ff 0f 00 00       cmp    $0xfff,%rax
    11d1:       75 0e                   jne    11e1 <main+0x78>
    11d3:       48 8d 3d 5e 0e 00 00    lea    0xe5e(%rip),%rdi        # 2038 <_IO_stdin_used+0x38>
    11da:       e8 81 fe ff ff          callq  1060 <puts@plt>
    11df:       eb 0c                   jmp    11ed <main+0x84>
    11e1:       48 8d 3d 60 0e 00 00    lea    0xe60(%rip),%rdi        # 2048 <_IO_stdin_used+0x48>
    11e8:       e8 73 fe ff ff          callq  1060 <puts@plt>
    11ed:       b8 00 00 00 00          mov    $0x0,%eax
...

我不是专家,但我可以看到与语句匹配的jne条件和两个调用。putsif (FOO_ADDR_IS_EXPECTED())

这是反汇编-O1

$ objdump -d ./foo_O1
...
0000000000001169 <main>:
...
    11c2:       b8 00 00 00 00          mov    $0x0,%eax
    11c7:       e8 a4 fe ff ff          callq  1070 <__printf_chk@plt>
    11cc:       48 8d 3d 65 0e 00 00    lea    0xe65(%rip),%rdi        # 2038 <_IO_stdin_used+0x38>
    11d3:       e8 88 fe ff ff          callq  1060 <puts@plt>
...

这一次,我没有看到任何条件,而是直接调用puts(for the printf("### UNEXPECTED ###\n");statement)。

为什么-O1优化会修改行为?为什么优化FOO_ADDR_IS_EXPECTED()为 false ?

一些上下文可以帮助您进行分析:

$ uname -rm
5.4.0-73-generic x86_64
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

编辑: 令人惊讶的是,修改0xFFF值以0xABC改变行为:

$ gcc -Wall -Wextra -Werror foo.c -O0 -o foo_O0 -T link.ld && ./foo_O0
__foo__ at 0x5653a7d4eabc
FOO_ADDR=0xabc
EXPECTED_ADDR=0xabc
***Expected ***

$ gcc -Wall -Wextra -Werror foo.c -O1 -o foo_O1 -T link.ld && ./foo_O1
__foo__ at 0x564323dddabc
FOO_ADDR=0xabc
EXPECTED_ADDR=0xabc
***Expected ***

正如Andrew Henle所指出的,地址对齐似乎很重要:使用0xABF而不是0xABC产生与 相同的结果0xFFF

4

4 回答 4

4

正如@AndrewHenle@chux-ReinstateMonica所建议的,这是一个对齐问题。

__foo__变量类型是:它的int地址应该是 32 位对齐的,意思
0xFFF是能被 4 整除。不能被 4 整除,所以编译器假定它不能是一个有效的int地址:它优化了相等性测试为假。

__foo__将的类型更改为删除对齐约束,并且在andchar中的行为保持不变:-O0-O1

// In foo.c
...
extern char __foo__;
...


$ gcc -Wall -Wextra -Werror foo.c -O0 -o foo_O0 -T link.ld && ./foo_O0
__foo__ at 0x55fbf8bedfff
FOO_ADDR=0xfff
EXPECTED_ADDR=0xfff
***Expected ***

$ gcc -Wall -Wextra -Werror foo.c -O1 -o foo_O1 -T link.ld && ./foo_O1
__foo__ at 0x5568d2debfff
FOO_ADDR=0xfff
EXPECTED_ADDR=0xfff
***Expected ***

于 2021-07-15T16:41:03.107 回答
3

(intptr_t)(&__foo__)当地址无效时是未定义行为(UB) 。__foo__

OP__foo__ = 0xFFF;可能会违反int.

OP 尝试了限制较少char成功

// extern int __foo__;
extern char __foo__; 

更大的优化往往会利用 UB。
我使用没有优化在高度优化时失败的作品作为 UB 潜伏在某处的暗示。在这种情况下,&__foo__无效。

于 2021-07-15T16:25:25.357 回答
1

我们知道-O会产生“行为”。

但是,-O*打开了一些更细粒度的-f优化选项。

我很好奇到底应该“责备”哪个。 -f

-f可以在以下位置找到选项列表: https ://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

产生该行为的具体优化是:

-ftree-bit-ccp

它的文档是:

在树上执行稀疏条件位常量传播并传播指针对齐信息。此传递仅对局部标量变量进行操作,并且默认情况下在 -O1 和更高版本中启用,但 -Og 除外。它要求启用 -ftree-ccp。


一开始,我不知道哪个-f选项在进行优化。因此,我决定一一应用选项并重建/重新运行测试程序。

由于懒惰,我不想手动执行此操作。我编写了一个 [perl] 脚本来提取上述.html文件,对其进行解析,并逐个应用各个-f选项。

旁注:具有讽刺意味的是,这可能比手动编辑文件来创建脚本需要更长的时间.html,但这很有趣......而且,有时我想知道哪个-f选项在我自己的代码中进行给定的优化,但我总是下注。

该脚本有点粗糙,但将来可能会对其进行改编和重用于其他测试程序。

#!/usr/bin/perl
# gccblame -- decide which -f option causes issues
#
# options:
#   "-A" -- specify __foo__ address (DEFAULT: FFF)
#   "-arr" -- define __foo__ as array
#   "-clean" -- clean generated files
#   "-doc" -- show documentation
#   "-f" -- preclean and force reload
#   "-no" -- apply -fno-foobar instead of -ffoobar
#   "-T<type>" -- specify __foo__ type (DEFAULT: int)
#   "-url" -- (DFT: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)

master(@ARGV);
exit(0);

# master -- master control
sub master
{
    my(@argv) = @_;

    # get command line options
    optdcd(\@argv,
        qw(opt_A opt_arr opt_clean opt_doc opt_f opt_no opt_T opt_url));
    $opt_T //= "int";
    $opt_A //= "FFF";
    $opt_A =~ s/^0x//;
    $opt_A = "0x" . $opt_A;
    $opt_arr = $opt_arr ? "[]" : "";
    $opt_url //= "https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html";

    $root = "fopturl";
    $fopt_ifile = clean("$root.html");
    $fopt_ofile = clean("$root.txt");

    $nul_c = clean("nul.c");
    $dftlink = clean("./default.ld");

    # compiled output
    clean("foo.o");
    clean("foo");

    $tmp = clean("tmp.txt");

    # clean generated files
    if ($opt_clean or $opt_f) {
        # get more files to clean
        sysall(0);

        foreach $file (sort(keys(%clean))) {
            if (-e $file) {
                printf("cleaning %s\n",$file);
                unlink($file);
            }
        }

        exit(0) if ($opt_clean);
    }

    # get the options documentation from the net
    $fopturl = fopturl();

    # parse it
    foptparse(@$fopturl);

    # create all static files
    sysall(1);

    # create linker scripts and test source file
    dftlink();

    ###exit(0);

    # start with just the -O option
    dopgm($opt_no ? "-O3" : "-Og");

    # test all -f options
    dolist();

    printf("\n");
    docstat()
        if ($opt_doc);
    printf("all options passed!\n");
}

# optdcd -- decode command line options
sub optdcd
{
    my(@syms) = @_;
    my($argv);
    my($arg);
    my($sym,$val,$match);

    $argv = shift(@syms);

    # get options
    while (@$argv > 0) {
        $arg = $argv->[0];
        last unless ($arg =~ /^-/);

        shift(@$argv);

        $match = 0;
        foreach $sym (@syms) {
            $opt = $sym;
            $opt =~ s/^opt_/-/;

            if ($arg =~ /^$opt(.*)$/) {
                $val = $1;

                $val =~ s/^=//;
                $val = 1
                    if ($val eq "");

                $$sym = $val;
                $match = 1;
                last;
            }
        }

        sysfault("optdcd: unknown option -- '%s'\n",$arg)
            unless ($match);
    }
}

# clean -- add to clean list
sub clean
{
    my($file) = @_;
    my($self,$tail);

    $self = filetail($0);
    $tail = filetail($file);

    sysfault("clean: attempt to clean script -- '%s'\n",$tail)
        if ($tail eq $self);

    $clean{$tail} = 1;

    $file;
}

# dftlink -- get default linker script
sub dftlink
{
    my($xfdst);
    my($buf,$body);
    my($grabflg);
    my($lno);

    # build it to get default link file
    $code = doexec("gcc","-o","/dev/null",$nul_c,
        "-v","-Wl,--verbose",">$dftlink","2>&1");
    exit(1) if ($code);

    # get all messages
    $body = fileload($dftlink);

    # split off the linker script from all the verbose messages
    open($xfdst,">$dftlink");
    while (1) {
        $buf = shift(@$body);
        last unless (defined($buf));

        if ($grabflg) {
            last if ($buf =~ /^=======/);
            print($xfdst $buf,"\n");
            ++$lno;
        }

        # get starting section and skip the "=======" line following
        if ($buf =~ /^using internal linker script:/) {
            $grabflg = 1;
            shift(@$body);
        }
    }
    close($xfdst);

    printf("dftlink: got %d lines\n",$lno);
    exit(1) if ($lno <= 0);
}

# sysall -- extract all files
sub sysall
{
    my($goflg) = @_;
    my($xfsrc,$xfdst,$buf);
    my($otail,$ofile);

    $xfsrc = sysdata("gccblame");

    while ($buf = <$xfsrc>) {
        chomp($buf);

        # apply variable substitution
        $buf = subenv($buf);

        # start new file
        if ($buf =~ /^%\s+(\S+)$/) {
            $otail = $1;

            # add to list of files to clean
            clean($otail);
            next unless ($goflg);

            close($xfdst)
                if (defined($ofile));
            $ofile = $otail;

            printf("dftlink: creating %s ...\n",$ofile);
            open($xfdst,">$ofile") or
                sysfault("dftlink: unable to open '%s' -- $!\n",$ofile);

            next;
        }

        print($xfdst $buf,"\n")
            if (defined($ofile));
    }

    close($xfdst)
        if (defined($ofile));
}

# fileload -- load up file contents
sub fileload
{
    my($file) = @_;
    my($xf);
    my(@data);

    open($xf,"<$file") or
        sysfault("fileload: unable to open '%s' -- $!\n",$file);

    @data = <$xf>;
    chomp(@data);

    close($xf);

    \@data;
}

# fopturl -- fetch and convert remote documentation file
sub fopturl
{
    my($sti,$sto);
    my($data);

    # get GCC's optimization options from remote server
    $sti = _fopturl($fopt_ifile,"curl","-s",$opt_url);

    # convert it to text
    $sto = _fopturl($sti,$fopt_ofile,"html2text",$fopt_ifile);

    # read in the semi-clean data
    $data = fileload($fopt_ofile);

    $data;
}

# _fopturl -- grab data
sub _fopturl
{
    my(@argv) = @_;
    my($sti);
    my($ofile);
    my($sto);

    $ofile = shift(@argv);
    if (ref($ofile)) {
        $sti = $ofile;
        $ofile = shift(@argv);
    }
    else {
        $sti = {};
    }

    while (1) {
        $sto = sysstat($ofile);
        if (ref($sto)) {
            last if ($sto->{st_mtime} >= $sti->{st_mtime});
        }

        $code = doexec(@argv,">$tmp");
        exit(1) if ($code);

        msgv("fopturl: RENAME",$tmp,$ofile);
        rename($tmp,$ofile) or
            sysfault("fopturl: unable to rename '%s' to '%s' -- $!\n",
                $tmp,$ofile);
    }

    $sto;
}

# foptparse -- parse and cross reference the options
sub foptparse
{
    local(@argv) = @_;
    local($buf);
    local($env);
    my(%uniq);

    $env = "xO";

    while (1) {
        $buf = shift(@argv);
        last unless (defined($buf));

        if ($buf =~ /^`-f/) {
            $env = "xB";
        }

        # initial are:
        #   -ffoo -fbar
        if (($env eq "xO") and ($buf =~ /^\s*-f/)) {
            _foptparse(0);
            next;
        }

        # later we have:
        # `-ffoo`
        # doclines
        if (($env eq "xB") and ($buf =~ /^`-f/)) {
            _foptparse(1);
            next;
        }

        if ($buf =~ /^`-O/) {
            printf("foptparse: OLVL %s\n",$buf);
            next;
        }
    }

    xrefuniq("xO","xB");
    xrefuniq("xB","xO");

    foreach $opt (@xO,@xB) {
        next if ($uniq{$opt});
        $uniq{$opt} = 1;
        push(@foptall,$opt);
    }
}

sub _foptparse
{
    my($fix) = @_;
    my($docsym,$docptr);

    $buf =~ s/^\s+//;
    $buf =~ s/\s+$//;

    if ($fix) {
        $buf =~ s/`//g;
    }

    printf("foptparse: %s %s\n",$env,$buf);

    @rhs = split(" ",$buf);
    foreach $buf (@rhs) {
        next if ($env->{$buf});
        $env->{$buf} = 1;

        push(@$env,$buf);

        $docsym //= $buf;
    }

    # get documentation for option
    if ($fix) {
        $docptr = [];
        $foptdoc{$docsym} = $docptr;

        while (1) {
            $buf = shift(@argv);
            last unless (defined($buf));

            # put back _next_ option
            if ($buf =~ /^`/) {
                unshift(@argv,$buf);
                last;
            }

            push(@$docptr,$buf);
        }

        # strip leading whitespace lines
        while (@$docptr > 0) {
            $buf = $docptr->[0];
            last if ($buf =~ /\S/);
            shift(@$docptr);
        }

        # strip trailing whitespace lines
        while (@$docptr > 0) {
            $buf = $docptr->[$#$docptr];
            last if ($buf =~ /\S/);
            pop(@$docptr);
        }
    }
}

# xrefuniq -- get unique set of options
sub xrefuniq
{
    my($envlhs,$envrhs) = @_;
    my($sym,$lhs,$rhs);

    while (($sym,$lhs) = each(%$envlhs)) {
        $rhs = $envrhs->{$sym};
        next if ($rhs);
        printf("xrefuniq: %s %s\n",$envlhs,$sym);
    }
}

# dolist -- process all -f options
sub dolist
{
    my($foptnew);

    foreach $foptnew (@foptall) {
        dopgm($foptnew);
    }
}

# dopgm -- compile, link, and run the "foo" program
sub dopgm
{
    my($foptnew) = @_;
    my($code);

    $foptnew =~ s/^-f/-fno-/
        if ($opt_no);

    printf("\n");
    printf("NEWOPT: %s\n",$foptnew);

    # show documentation
    docshow($foptnew);

    {
        # compile to .o -- this proves that the compiler is changing things
        # and _not_ some link time optimization
        $code = doexec(qw(gcc -Wall -Wextra -Werror foo.c -c),
            @foptlhs,$foptnew);

        # the source should always compile cleanly -- if not, the option is
        # just bad/unknown
        if ($code) {
            printf("IGNORING: %s\n",$foptnew);
            ###pop(@foptlhs);
            last;
        }
        push(@foptlhs,$foptnew);

        # build the final program
        $code = doexec(qw(gcc -Wall -Wextra -Werror foo.o -o foo),
            "-T","link.ld");
        exit(1) if ($code);

        # run the program
        $code = doexec("./foo");

        # if it runs cleanly, we have the bad option
        if ($opt_no) {
            $code = ! $code;
        }

        if ($code) {
            printf("\n");
            printf("BADOPT: %s\n",$foptnew);
            exit(1);
        }
    }
}

# docshow -- show documentation
sub docshow
{
    my($foptnew) = @_;
    my($docptr,$docrhs,$doclhs,$doclen);
    my(@opt);

    {
        last unless ($opt_doc);

        $docptr = $foptdoc{$foptnew};
        last unless (ref($docptr));

        push(@opt,"-pre=#","#");

        foreach $docrhs (@$docptr) {
            $doclen = length($docrhs);

            # remember max length
            if ($doclen > $docmax) {
                $docmax = $doclen;
                printf("NEWMAX: %d\n",$docmax);
            }

            $dochisto[$doclen] += 1;

            if ($doclen > 78) {
                msgv(@opt,split(" ",$docrhs));
            }
            else {
                msgv(@opt,$docrhs);
            }
        }
    }
}

# docstat -- show documentations statistics
sub docstat
{
    my($curlen);
    my($cnt);

    printf("DOCMAX: %d\n",$docmax);

    $curlen = -1;
    foreach $cnt (@dochisto) {
        ++$curlen;
        next if ($cnt <= 0);

        $ptr = $lookup[$cnt];
        $ptr //= [];
        $lookup[$cnt] = $ptr;

        push(@$ptr,$curlen);
    }

    $cnt = -1;
    foreach $ptr (@lookup) {
        ++$cnt;
        next unless (ref($ptr));
        msgv("DOCLEN: $cnt",@$ptr);
    }
}

# doexec -- execute a program
sub doexec
{
    my(@argv) = @_;
    my($cmd);
    my($code);

    msgv("doexec: EXEC",@argv);

    $cmd = join(" ",@argv);
    system($cmd);

    $code = ($? >> 8) & 0xFF;

    $code;
}

# filetail -- get file tail
sub filetail
{
    my($file) = @_;

    $file =~ s,.*/,,g;

    $file;
}

# msgv -- output a message
sub msgv
{
    my(@argv) = @_;
    local($opt_pre);
    my($seplen);
    my($rhs);
    my($prenow);
    my($lhs);
    my($lno);

    optdcd(\@argv,qw(opt_pre));
    $opt_pre //= "+";
    $opt_pre .= " ";

    foreach $rhs (@argv) {
        $seplen = (length($lhs) > 0);

        if ((length($prenow) + length($lhs) + $seplen + length($rhs)) > 80) {
            printf("%s%s\n",$prenow,$lhs);
            undef($lhs);
            $prenow = $opt_pre;
            ++$lno;
        }

        $lhs .= " "
            if (length($lhs) > 0);

        $lhs .= $rhs;
    }

    if (length($lhs) > 0) {
        printf("%s%s\n",$prenow,$lhs);
        ++$lno;
    }

    $lno;
}

# subenv -- substitute environment
sub subenv
{
    my($rhs) = @_;
    my($ix);
    my($sym,$val);
    my($lhs);

    while (1) {
        $ix = index($rhs,'${');
        last if ($ix < 0);

        $lhs .= substr($rhs,0,$ix);
        $rhs = substr($rhs,$ix + 2);

        $ix = index($rhs,"}");
        $sym = substr($rhs,0,$ix);
        $rhs = substr($rhs,$ix + 1);

        $val = $$sym;
        sysfault("subenv: unknown symbol -- '%s'\n",$sym)
            unless (defined($val));

        $lhs .= $val;
    }

    $lhs .= $rhs;

    $lhs;
}

# sysdata -- locate the __DATA__ unit
sub sysdata
{
    my($pkgsrc) = @_;
    my($xfsrc,$sym,$pos);

    $pkgsrc //= caller();

    {
        $sym = $pkgsrc . "::DATA";

        $xfsrc = \*$sym;

        # remember the starting position -- since perl doesn't :-(
        $pos = \$sysdata_rewind{$pkgsrc};

        $$pos = tell($xfsrc)
            unless (defined($$pos));

        last if (seek($xfsrc,$$pos,0));

        sysfault("sysdata: seek fault pkgsrc='$pkgsrc' pos=$$pos -- $!\n");
    }

    return wantarray ? ($xfsrc,$sym,$$pos) : $xfsrc;
}

# sysfault -- fault
sub sysfault
{

    printf(@_);
    exit(1);
}

# sysstat -- get file status
sub sysstat
{
    my($file) = @_;
    my(@st);
    my($st);

    @st = stat($file);

    if (@st > 0) {
        $st = {};

        ($st->{st_dev},
        $st->{st_ino},
        $st->{st_mode},
        $st->{st_nlink},
        $st->{st_uid},
        $st->{st_gid},
        $st->{st_rdev},
        $st->{st_size},
        $st->{st_atime},
        $st->{st_mtime},
        $st->{st_ctime},
        $st->{st_blksize},
        $st->{st_blocks}) = @st;
    }

    $st;
}

package gccblame;
__DATA__
% foo.c
#include <stdint.h>
#include <stdio.h>

extern ${opt_T} __foo__${opt_arr};

#define IPTR(_adr)              ((intptr_t) _adr)
#define ADDR_MASK               IPTR(0xFFF)
#define EXPECTED_ADDR           IPTR(${opt_A})
#define FOO_ADDR                (IPTR(&__foo__) & ADDR_MASK)
#define FOO_ADDR_IS_EXPECTED()  (FOO_ADDR == EXPECTED_ADDR)

int
main(void)
{

    printf("__foo__ at %p\n", &__foo__);
    printf("FOO_ADDR=0x%lx\n", FOO_ADDR);
    printf("EXPECTED_ADDR=0x%lx\n", EXPECTED_ADDR);

    int ok = FOO_ADDR_IS_EXPECTED();

    if (ok) {
        printf("***Expected ***\n");
    }
    else {
        printf("### UNEXPECTED ###\n");
    }

    return ! ok;
}
% ${nul_c}
int
main(void)
{
    return 0;
}
% link.ld
INCLUDE ${dftlink}
__foo__ = ${opt_A};
于 2021-07-17T03:19:54.477 回答
1

除非完全禁用优化,否则如果代码在基于外部符号的地址和不基于相同符号的地址之间进行比较,gcc 和 clang 都容易出现无意义的行为。该问题超出了将此类比较视为产生未指定结果的范围,并且可能导致代码行为与比较产生真或产生假不一致。

extern int x[1],y[1];
int test(int *p)
{
    y[0] = 1;
    if (p == x+1)
        *p = 2;
    return y[0];
}

clang 和 gcc 都将生成代码,如果test传递了的地址y并且它恰好紧随其后x,则设置y[0]为 2,然后返回 1。这种行为几年前已经报道过,但我不知道除此之外的任何选项-O0使编译器以符合标准的方式处理此类函数。

于 2021-07-15T16:14:12.060 回答