1

我想arm-none-eabi-gcc 9.2.1使用libopencm3项目编译一个程序并在 ARM Cortex-M4 处理器上运行它。我的程序由两个文件组成:main.c

#include "../common/stm32wrapper.h"
#include "test.h"
#include <stdio.h>
#include <string.h>

typedef unsigned char u8;
typedef unsigned int  u32;
typedef unsigned long long u64;

int main(void)
{
    clock_setup();
    gpio_setup();
    usart_setup(115200);
    flash_setup();

    SCS_DEMCR |= SCS_DEMCR_TRCENA;
    DWT_CYCCNT = 0;
    DWT_CTRL |= DWT_CTRL_CYCCNTENA;

    u32 oldcount, newcount;
    u32 a = 0x75;
    u32 b = 0x14;
    char buffer[36];
    oldcount = DWT_CYCCNT;
    u32 c = test(a,b);
    newcount = DWT_CYCCNT-oldcount;
    sprintf(buffer, "cycles: %d, %08x", newcount, c);
    send_USART_str(buffer);
    return 0;
}

test.c

uint32_t test(uint32_t a, uint32_t b) {
    uint32_t tmp0, tmp1;
    uint32_t c;

    for(int i = 0; i< 4096; i++) {
        tmp0 = a & 0xff;
        tmp1 = b & 0xff;
        c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
    }
    return c;
}

为了编译我的程序,我使用以下 makefile:

.PHONY: all clean

PREFIX  ?= arm-none-eabi
CC      = $(PREFIX)-gcc -v
LD      = $(PREFIX)-gcc -v
OBJCOPY = $(PREFIX)-objcopy
OBJDUMP = $(PREFIX)-objdump
GDB     = $(PREFIX)-gdb

OPENCM3DIR = ../libopencm3
ARMNONEEABIDIR = /usr/arm-none-eabi
COMMONDIR = ../common

all: test_m4.bin

test_m4.%: ARCH_FLAGS = -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
test_m4.o: CFLAGS += -DSTM32F4
$(COMMONDIR)/stm32f4_wrapper.o: CFLAGS += -DSTM32F4
test_m4.elf: LDSCRIPT = $(COMMONDIR)/stm32f4-discovery.ld
test_m4.elf: LDFLAGS += -L$(OPENCM3DIR)/lib/ -lopencm3_stm32f4
test_m4.elf: OBJS += $(COMMONDIR)/stm32f4_wrapper.o 
test_m4.elf: $(COMMONDIR)/stm32f4_wrapper.o $(OPENCM3DIR)/lib/libopencm3_stm32f4.a

CFLAGS      += -O3 \
           -Wall -Wextra -Wimplicit-function-declaration \
           -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes \
           -Wundef -Wshadow \
           -I$(ARMNONEEABIDIR)/include -I$(OPENCM3DIR)/include \
           -fno-common $(ARCH_FLAGS) -MD \
           -ftime-report
LDFLAGS     += --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group \
           -T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections,--no-print-gc-sections \
           $(ARCH_FLAGS)

OBJS        += test.c

%.bin: %.elf
    $(OBJCOPY) -Obinary $^ $@

%.elf: %.o $(OBJS) $(LDSCRIPT)
    $(LD) -o $@ $< $(OBJS) $(LDFLAGS)

test%.o: main.c
    $(CC) $(CFLAGS) -o $@ -c $^

%.o: %.c 
    $(CC) $(CFLAGS) -o $@ -c $^

clean:
    rm -f *.o *.d *.elf *.bin

我可以使用这个 makefile 编译和运行我的代码。通过运行,make我得到以下输出:

arm-none-eabi-gcc -v -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -I/usr/arm-none-eabi/include -I../libopencm3/include -fno-common -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -ftime-report -DSTM32F4 -o test_m4.o -c main.c
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -MD test_m4.d -MQ test_m4.o -D__USES_INITFINI__ -D STM32F4 main.c -quiet -dumpbase main.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase-strip test_m4.o -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -version -fno-common -ftime-report -o /tmp/ccm5h1i9.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
ignoring nonexistent directory "/usr/arm-none-eabi/include"
#include "..." search starts here:
#include <...> search starts here:
 ../libopencm3/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184

Time variable                                   usr           sys          wall               GGC
 phase setup                        :   0.01 (  8%)   0.01 ( 20%)   0.03 ( 17%)    3569 kB ( 62%)
 phase parsing                      :   0.10 ( 83%)   0.04 ( 80%)   0.14 ( 78%)    2069 kB ( 36%)
 phase opt and generate             :   0.01 (  8%)   0.00 (  0%)   0.01 (  6%)     120 kB (  2%)
 preprocessing                      :   0.03 ( 25%)   0.03 ( 60%)   0.03 ( 17%)     889 kB ( 15%)
 lexical analysis                   :   0.04 ( 33%)   0.00 (  0%)   0.05 ( 28%)       0 kB (  0%)
 parser (global)                    :   0.02 ( 17%)   0.00 (  0%)   0.04 ( 22%)    1063 kB ( 18%)
 parser struct body                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  6%)      41 kB (  1%)
 parser enumerator list             :   0.01 (  8%)   0.01 ( 20%)   0.01 (  6%)      54 kB (  1%)
 tree gimplify                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  6%)       8 kB (  0%)
 initialize rtl                     :   0.01 (  8%)   0.00 (  0%)   0.00 (  0%)       7 kB (  0%)
 TOTAL                              :   0.12          0.05          0.18           5767 kB
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o test_m4.o /tmp/ccm5h1i9.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
arm-none-eabi-gcc -v -o test_m4.elf test_m4.o test.c ../common/stm32f4_wrapper.o  --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group -T../common/stm32f4-discovery.ld -nostartfiles -Wl,--gc-sections,--no-print-gc-sections -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -L../libopencm3/lib/ -lopencm3_stm32f4
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -D__USES_INITFINI__ test.c -quiet -dumpbase test.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase test -version -o /tmp/cc3yny6o.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o /tmp/ccfflDpW.o /tmp/cc3yny6o.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/collect2 -plugin /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/liblto_plugin.so -plugin-opt=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper -plugin-opt=-fresolution=/tmp/cc4qN1Kt.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc --sysroot=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -Bstatic -X -o test_m4.elf -L../libopencm3/lib/ -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1 -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib test_m4.o /tmp/ccfflDpW.o ../common/stm32f4_wrapper.o --start-group -lc -lgcc -lnosys --end-group --gc-sections --no-print-gc-sections -lopencm3_stm32f4 --start-group -lgcc -lc --end-group -T ../common/stm32f4-discovery.ld
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
arm-none-eabi-objcopy -Obinary test_m4.elf test_m4.bin

似乎优化标志似乎没有被考虑在内,因为无论我放什么,生成的二进制文件总是相同的,程序总是 print cycles: 196645, 00000063。通过反汇编二进制文件,我得到了 -Os 和 -O3 优化的以下输出:

080001ac <main>:
 80001ac:   b570        push    {r4, r5, r6, lr}
 80001ae:   b08a        sub sp, #40 ; 0x28
 80001b0:   f006 fc06   bl  80069c0 <clock_setup>
 80001b4:   f006 fc1c   bl  80069f0 <gpio_setup>
 80001b8:   f44f 30e1   mov.w   r0, #115200 ; 0x1c200
 80001bc:   f006 fc32   bl  8006a24 <usart_setup>
 80001c0:   f006 fc52   bl  8006a68 <flash_setup>
 80001c4:   490e        ldr r1, [pc, #56]   ; (8000200 <main+0x54>)
 80001c6:   4c0f        ldr r4, [pc, #60]   ; (8000204 <main+0x58>)
 80001c8:   680b        ldr r3, [r1, #0]
 80001ca:   4a0f        ldr r2, [pc, #60]   ; (8000208 <main+0x5c>)
 80001cc:   2500        movs    r5, #0
 80001ce:   f043 7380   orr.w   r3, r3, #16777216   ; 0x1000000
 80001d2:   600b        str r3, [r1, #0]
 80001d4:   6025        str r5, [r4, #0]
 80001d6:   6813        ldr r3, [r2, #0]
 80001d8:   f043 0301   orr.w   r3, r3, #1
 80001dc:   6013        str r3, [r2, #0]
 80001de:   6826        ldr r6, [r4, #0]
 80001e0:   f000 f816   bl  8000210 <test>
 80001e4:   6822        ldr r2, [r4, #0]
 80001e6:   4909        ldr r1, [pc, #36]   ; (800020c <main+0x60>)
 80001e8:   4603        mov r3, r0
 80001ea:   1b92        subs    r2, r2, r6
 80001ec:   a801        add r0, sp, #4
 80001ee:   f006 fca5   bl  8006b3c <sprintf>
 80001f2:   a801        add r0, sp, #4
 80001f4:   f006 fc48   bl  8006a88 <send_USART_str>
 80001f8:   4628        mov r0, r5
 80001fa:   b00a        add sp, #40 ; 0x28
 80001fc:   bd70        pop {r4, r5, r6, pc}
 80001fe:   bf00        nop
 8000200:   e000edfc    .word   0xe000edfc
 8000204:   e0001004    .word   0xe0001004
 8000208:   e0001000    .word   0xe0001000
 800020c:   0800c1e8    .word   0x0800c1e8

08000210 <test>:
 8000210:   b480        push    {r7}
 8000212:   b087        sub sp, #28
 8000214:   af00        add r7, sp, #0
 8000216:   2375        movs    r3, #117    ; 0x75
 8000218:   60fb        str r3, [r7, #12]
 800021a:   2314        movs    r3, #20
 800021c:   60bb        str r3, [r7, #8]
 800021e:   2300        movs    r3, #0
 8000220:   613b        str r3, [r7, #16]
 8000222:   e020        b.n 8000266 <test+0x56>
 8000224:   68fb        ldr r3, [r7, #12]
 8000226:   b2db        uxtb    r3, r3
 8000228:   607b        str r3, [r7, #4]
 800022a:   68bb        ldr r3, [r7, #8]
 800022c:   b2db        uxtb    r3, r3
 800022e:   603b        str r3, [r7, #0]
 8000230:   687a        ldr r2, [r7, #4]
 8000232:   683b        ldr r3, [r7, #0]
 8000234:   405a        eors    r2, r3
 8000236:   693b        ldr r3, [r7, #16]
 8000238:   2b00        cmp r3, #0
 800023a:   da01        bge.n   8000240 <test+0x30>
 800023c:   f203 13ff   addw    r3, r3, #511    ; 0x1ff
 8000240:   125b        asrs    r3, r3, #9
 8000242:   4619        mov r1, r3
 8000244:   68fb        ldr r3, [r7, #12]
 8000246:   40cb        lsrs    r3, r1
 8000248:   405a        eors    r2, r3
 800024a:   693b        ldr r3, [r7, #16]
 800024c:   2b00        cmp r3, #0
 800024e:   da01        bge.n   8000254 <test+0x44>
 8000250:   f203 33ff   addw    r3, r3, #1023   ; 0x3ff
 8000254:   129b        asrs    r3, r3, #10
 8000256:   4619        mov r1, r3
 8000258:   68bb        ldr r3, [r7, #8]
 800025a:   40cb        lsrs    r3, r1
 800025c:   4053        eors    r3, r2
 800025e:   617b        str r3, [r7, #20]
 8000260:   693b        ldr r3, [r7, #16]
 8000262:   3301        adds    r3, #1
 8000264:   613b        str r3, [r7, #16]
 8000266:   693b        ldr r3, [r7, #16]
 8000268:   f5b3 5f80   cmp.w   r3, #4096   ; 0x1000
 800026c:   dbda        blt.n   8000224 <test+0x14>
 800026e:   697b        ldr r3, [r7, #20]
 8000270:   4618        mov r0, r3
 8000272:   371c        adds    r7, #28
 8000274:   46bd        mov sp, r7
 8000276:   f85d 7b04   ldr.w   r7, [sp], #4
 800027a:   4770        bx  lr

对我来说这似乎很奇怪,因为代码在速度方面可以明显增强。例如,uxtb可以计算一个而不是两个(如果在 之后执行eor),所以我相信这里有问题。为什么这里没有考虑优化标志?我的makefile有问题吗?

4

3 回答 3

1
typedef unsigned int uint32_t;

uint32_t test(uint32_t a, uint32_t b) {
    uint32_t tmp0, tmp1;
    uint32_t c;

    for(int i = 0; i< 4096; i++) {
        tmp0 = a & 0xff;
        tmp1 = b & 0xff;
        c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
    }
    return c;
}

unsigned int hello ( void )
{
    return(test(0x75,0x14));
}

9.3.0 和 9.2.1 差别不大,如果你想看的话,我可以专门买一个 9.2.1,但你可以自己做。

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-O0

arm-none-eabi-gcc -O0 so.c -c -mthumb -mcpu=cortex-m4 -o so.o

Disassembly of section .text:

00000000 <test>:
   0:   b480        push    {r7}
   2:   b087        sub sp, #28
   4:   af00        add r7, sp, #0
   6:   6078        str r0, [r7, #4]
   8:   6039        str r1, [r7, #0]
   a:   2300        movs    r3, #0
   c:   613b        str r3, [r7, #16]
   e:   e020        b.n 52 <test+0x52>
  10:   687b        ldr r3, [r7, #4]
  12:   b2db        uxtb    r3, r3
  14:   60fb        str r3, [r7, #12]
  16:   683b        ldr r3, [r7, #0]
  18:   b2db        uxtb    r3, r3
  1a:   60bb        str r3, [r7, #8]
  1c:   68fa        ldr r2, [r7, #12]
  1e:   68bb        ldr r3, [r7, #8]
  20:   405a        eors    r2, r3
  22:   693b        ldr r3, [r7, #16]
  24:   2b00        cmp r3, #0
  26:   da01        bge.n   2c <test+0x2c>
  28:   f203 13ff   addw    r3, r3, #511    ; 0x1ff
  2c:   125b        asrs    r3, r3, #9
  2e:   4619        mov r1, r3
  30:   687b        ldr r3, [r7, #4]
  32:   40cb        lsrs    r3, r1
  34:   405a        eors    r2, r3
  36:   693b        ldr r3, [r7, #16]
  38:   2b00        cmp r3, #0
  3a:   da01        bge.n   40 <test+0x40>
  3c:   f203 33ff   addw    r3, r3, #1023   ; 0x3ff
  40:   129b        asrs    r3, r3, #10
  42:   4619        mov r1, r3
  44:   683b        ldr r3, [r7, #0]
  46:   40cb        lsrs    r3, r1
  48:   4053        eors    r3, r2
  4a:   617b        str r3, [r7, #20]
  4c:   693b        ldr r3, [r7, #16]
  4e:   3301        adds    r3, #1
  50:   613b        str r3, [r7, #16]
  52:   693b        ldr r3, [r7, #16]
  54:   f5b3 5f80   cmp.w   r3, #4096   ; 0x1000
  58:   dbda        blt.n   10 <test+0x10>
  5a:   697b        ldr r3, [r7, #20]
  5c:   4618        mov r0, r3
  5e:   371c        adds    r7, #28
  60:   46bd        mov sp, r7
  62:   bc80        pop {r7}
  64:   4770        bx  lr

00000066 <hello>:
  66:   b580        push    {r7, lr}
  68:   af00        add r7, sp, #0
  6a:   2114        movs    r1, #20
  6c:   2075        movs    r0, #117    ; 0x75
  6e:   f7ff fffe   bl  0 <test>
  72:   4603        mov r3, r0
  74:   4618        mov r0, r3
  76:   bd80        pop {r7, pc}

-O1

arm-none-eabi-gcc -O1 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <test>:
   0:   f44f 5380   mov.w   r3, #4096   ; 0x1000
   4:   3b01        subs    r3, #1
   6:   d1fd        bne.n   4 <test+0x4>
   8:   08ca        lsrs    r2, r1, #3
   a:   ea82 12d0   eor.w   r2, r2, r0, lsr #7
   e:   ea80 0301   eor.w   r3, r0, r1
  12:   b2db        uxtb    r3, r3
  14:   ea82 0003   eor.w   r0, r2, r3
  18:   4770        bx  lr

0000001a <hello>:
  1a:   b508        push    {r3, lr}
  1c:   2114        movs    r1, #20
  1e:   2075        movs    r0, #117    ; 0x75
  20:   f7ff fffe   bl  0 <test>
  24:   bd08        pop {r3, pc}

-O2

Disassembly of section .text:

00000000 <test>:
   0:   ea80 0301   eor.w   r3, r0, r1
   4:   08ca        lsrs    r2, r1, #3
   6:   ea82 10d0   eor.w   r0, r2, r0, lsr #7
   a:   b2db        uxtb    r3, r3
   c:   4058        eors    r0, r3
   e:   4770        bx  lr

00000010 <hello>:
  10:   2063        movs    r0, #99 ; 0x63
  12:   4770        bx  lr

-O3

00000000 <test>:
   0:   ea80 0301   eor.w   r3, r0, r1
   4:   08ca        lsrs    r2, r1, #3
   6:   ea82 10d0   eor.w   r0, r2, r0, lsr #7
   a:   b2db        uxtb    r3, r3
   c:   4058        eors    r0, r3
   e:   4770        bx  lr

00000010 <hello>:
  10:   2063        movs    r0, #99 ; 0x63
  12:   4770        bx  lr

-Os

00000000 <test>:
   0:   08cb        lsrs    r3, r1, #3
   2:   ea83 13d0   eor.w   r3, r3, r0, lsr #7
   6:   4048        eors    r0, r1
   8:   b2c0        uxtb    r0, r0
   a:   4058        eors    r0, r3
   c:   4770        bx  lr

0000000e <hello>:
   e:   2114        movs    r1, #20
  10:   2075        movs    r0, #117    ; 0x75
  12:   f7ff bffe   b.w 0 <test>

如果所有这些都在相同的时间内执行,那么显然是的,你要么有构建问题,要么你的测试有问题。如果您声称 -O1 和 -O2 和 -O3 等都产生相同的输出,那么您实际上并没有使用这些优化级别。

没有理由假设 -Os 生成的二进制文件比 -O2 或 -O3 更小。只是你在暗示这种愿望。您可以创建例外。

也没有理由假设编译后的大小会执行得更快,-O3 等也不会执行得更快。尤其是在像这样的平台(以及所有现代平台)上,其中某些百分比的性能与指令的数量或序列没有直接关系,但是整个系统。

您使用的是 stm32,cortex-m4,因此您拥有无法关闭的 st 闪存缓存,现在这将有助于所有测试,但也会隐藏一些东西。您有一个时钟初始化,然后是一个闪存设置,想知道如果您正在提高时钟,那么您必须先放慢闪存速度而不是之后,否则您可能会崩溃。对于这样的测试,通常没有理由增加时钟,您希望在理想情况下以计时器时钟周期测量系统(如在 cpu 中)时钟周期,然后以较慢的时钟速度执行诸如弄乱闪存等待状态之类的事情(有些零件全系列,但是)您可以使用最少的闪存等待状态,然后简单地提高等待状态以进行不同的测试,而无需增加时钟来查看闪存如何影响它不幸的是,这是一个 stm32。

根据内核的编译时间选项,某些内核具有不同的获取功能和其他功能,并且您可能有一些核心功能,您可能会弄乱紧密循环的简单对齐更改,这样可能会产生巨大的影响,相同的机器代码开始在不同的地址,它在提取线和缓存线中的排列方式会影响基准测试结果。

请注意,您可以使用调试器计时器所需的 systick 计时器获得相同的结果。可以将时间收集包装在被测代码中(不是在函数中,但是当您使用汇编语言制作被测代码时,您可以在之前和之后添加时间收集,而不会产生函数调用开销,该开销本身可能会有所不同从测试到测试。

如果您看到编译器针对不同的设置出现相同的机器代码,那么您实际上并没有使用这些设置进行构建,也没有真正重新构建应用程序,或者其他形式的用户错误(在此处构建并从那里使用二进制文件) )。因此,在这种情况下,理想情况下,相同的二进制文件将给出相同的时间加或减时钟。但这也取决于您如何运行或重新运行测试。您是否想查看缓存效果,启动缓存然后运行测试等。

如果您开始看到不同的机器代码,或者您确实看到不同的机器代码但获得相同的时间,那么错误在于时间测量,这是基准测试中经常被忽视的问题。只要您真的看到该计时器,您的方法似乎就很好,并且已经完成了测试以查看计时器正在计数并且朝着您期望的方向运行。如果这是一些指令计数器未执行的时间,那么您仍然可以测试它以查看它是否按照您的想法执行。我对那些调试工具没有用处,所以不要涉足它们,也不要像我对这些系统的其他事情一样熟悉它们。

作为 m4,您可能可以打开/关闭其他功能,以查看基于生成的代码、分支预测、缓存、类似 mmu 的事物等的性能差异。

这可能是您使用的标志的顺序(每个标志都是第一个问题的原因)相对于 -O3,有些可能会否定其他优化功能。

很想知道这里的真正目标是什么。理解基准测试是无稽之谈,因为它们很容易操作,由于各种原因,相同的高级代码不会在相同的目标上使用相同或不同的工具产生相同的结果。降低命令行并尝试 clang/llvm vs gnu 或尝试 gcc 4.xx、5.xx 等。在 4.xx 之后,输出开始变得臃肿,编译器没有做得那么好,对于这样的事情,虽然它们应该非常接近,但同时一条或多或少的指令,一个简单的对齐差异可能会造成两个测试的结果相差很大。

然后,当您放回时钟设置以改变事情的工作方式时,您可以假设不使用等待状态(闪存可能以 CPU 速率运行,因此内置等待),例如高达 25mhz添加一个等待状态,直到 50 等等。根据设计的不同,一些较新的部件闪存可以比旧部件运行得更快,但在 25mhz 与 8 的情况下,相同数量的时钟是一个整体较少的时间数量,即挂钟时间。在边界处,如果您创建/修改时钟初始化代码并获得性能提升,则可以说不会影响等待状态,但刚刚超过该边界,您会因闪存等待状态的增加而受到性能影响。所以那里有一个性能平衡。

概括

如果相同的代码来自编译器,那么它就是您的命令行,您可以轻松地简化命令行以查看工具将生成不同的代码。如果您的比较错误并且代码不同,那么问题在于您如何计时代码,这通常是基准测试出错的地方,以及与编译器命令行无关的其他因素。基准通常是无稽之谈,因为它们可以被操纵以显示不同的结果(即使不更改测试的高级源代码)。

尝试简化命令行,检查那里的每个选项,并说明为什么它适用于您的特定应用程序。尽可能验证计时器或指令计数器(无论是哪一种)(并了解执行的指令计数与性能没有直接关系,您可以拥有比其他解决方案执行速度更快的 100 倍指令)。

没有理由期望 -Os 产生更小的代码,人们希望但也有例外。同样,-Os 可能比 -O2 或 -O3 执行得更快,没有理由期望更大数量的优化级别产生“更快”的代码。

于 2020-05-23T20:41:35.353 回答
1

您正在使用 -O0 标志编译代码。

在这里可以清楚地看到: https ://godbolt.org/z/qZPYqJ

所以编译器总是正确的。没有发现遗漏的优化。

于 2020-05-23T22:39:40.853 回答
0

好吧,真正的答案并不容易,但在分解某些东西之前,应该知道优化实际上是什么以及编译器如何实现其目标。考虑到 gcc,Os 和 03 之间几乎没有区别,因为它们打开了几乎相同的内部标志,除了 Os 的循环展开。

此外,如今的 cpu 将所有内容都保存在缓存中,无论如何都更快。

于 2021-07-06T18:53:11.317 回答