我们有一个庞大的代码库,其中包含超过 40 个项目(用 VS 术语),创建了几个 DLL/SO(~15)和一个 EXE。
有一些实用程序项目被静态链接以创建 EXE,并且也被大多数 DLL 使用。理想情况下,我们希望这些实用程序项目也是 DLL,这样代码就不会在依赖它们的每个 DLL 中重复。
是否有任何工具可以对 DLL 进行二进制分析以查看存在多少重复项(代码 + 数据)?对此进行估计会有所帮助。
我们有一个庞大的代码库,其中包含超过 40 个项目(用 VS 术语),创建了几个 DLL/SO(~15)和一个 EXE。
有一些实用程序项目被静态链接以创建 EXE,并且也被大多数 DLL 使用。理想情况下,我们希望这些实用程序项目也是 DLL,这样代码就不会在依赖它们的每个 DLL 中重复。
是否有任何工具可以对 DLL 进行二进制分析以查看存在多少重复项(代码 + 数据)?对此进行估计会有所帮助。
没有工具,只有你耳朵之间的那个。您想专注于链接静态库的项目,找到多次使用相同静态库的项目。这是假设一个函数可以多次链接的起点。
然后您可以使用链接器的 /VERBOSE 选项,它会显示从静态库中链接的函数。该选项有很多输出,但是它很简短且易于解析。
作为替代方案,考虑使用链接器的 /MAP 选项生成 .map 文件。其中详细显示了哪些函数链接到了最终的可执行文件中。在不同的 .map 文件中多次出现相同的函数是您的线索,将其放在 DLL 中可能是有益的。用您最喜欢的脚本语言编写一个小程序来处理 /VERBOSE 输出或 .map 文件并找到匹配项是可行的。
Well, on a Unix/Linux/OSX system you'd do something like
for eachfile in *.exe *.dll ; do
nm $eachfile | sort | uniq > $eachfile.symbols.txt
done
cat *.symbols.txt | sort | uniq -c > count-duplicate-symbols.txt
sort -r count-duplicate-symbols.txt | less
The first three lines say "Dump the symbols out of each .exe and .dll file in the current directory; store each dump in a separate file. By the way, if the same line appears multiple times in a single file, just store it once."
The line beginning with cat
says "Count the number of times each line appears across all the files we just produced. Write a new file named count-duplicate-symbols.txt
that contains the duplicated lines with their counts."
The final line says "Sort this file by the number of duplicates (in increasing order), and pipe it to the terminal so I can read it."
If you wanted to see which source files contained the offending duplicate symbols, you could use grep
for that.
Notice that this approach probably won't work for static
symbols (functions and variables), and it may produce false positives for things like inline functions which are supposed to appear everywhere. You could filter out symbols appearing in linkonce sections, prettyprint the output with c++filt
, etc. etc.
Some of these tools are definitely available for Windows. I don't know if they all are.