python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？

Question

我想从二进制（“.exe”）文件中获取 Unicode 字符串。

当我使用这样的代码时：

    `unicode_str = re.compile( u'[\u0020-\u007e]{1,}',re.UNICODE )`

它有效，但它只返回分隔符号，所以当我尝试将量词更改为 3 时：

Python： unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )

珀尔： my @a = ( $file =~ /[\x{0020}-\x{007e}]{3,}/gs );

我只得到 ASCII 符号，所有 Unicode 符号都消失了。

我在哪里犯了错误，或者我可能对Unicode一无所知？

评论中的代码：

Python：

File = open( sys.argv[1], "rb" )
FileData = File.read()
File.close()
unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )
myList = unicode_str.findall(FileData)
for p in myList:
    print p

珀尔：

$/ = "newline separator";
my $input = shift;
open( File, $input );
my $file = <File>;
close( File );
my @a = ( $file =~ /[\x{0020}-\x{007e}]{3,}/gs );
foreach ( @a ) { print "$_\n"; }

score 3 · Accepted Answer

有人已经编写了一个实用程序来满足您的需求：

http://technet.microsoft.com/en-us/sysinternals/bb897439.aspx

usage: strings [-a] [-f offset] [-b bytes] [-n length] [-o] [-q] [-s] [-u] <file or directory>

Strings takes wild-card expressions for file names, and additional command line parameters are defined as follows:

-a  Ascii-only search (Unicode and Ascii is default)
-b  Bytes of file to scan
-f  File offset at which to start scanning.
-o  Print offset in file string was located
-n  Minimum string length (default is 3)
-q  Quiet (no banner)
-s  Recurse subdirectories
-u  Unicode-only search (Unicode and Ascii is default)  

To search one or more files for the presence of a particular string using strings use a command like this:

strings * | findstr /i TextToSearchFor

编辑：

Try this if you want to implement it in Python, but you'll have to decide what range of Unicode characters you're looking for and search for it as UTF-16LE. Many pairs of characters look like valid printable Unicode. I don't know what algorithm strings uses

import re
data = open('c:/users/metolone/util/windiff.exe','rb').read()

# Search for printable ASCII characters encoded as UTF-16LE.
pat = re.compile(ur'(?:[\x20-\x7E][\x00]){3,}')
words = [w.decode('utf-16le') for w in pat.findall(data)]
for w in words:
    print w

score 0 · Accepted Answer

use Win32::Exe;
my $exe = Win32::Exe->new('foo.exe');
my $inforef = $exe->get_version_info;
printf "%s: %s\n", $_, $inforef->{$_} for qw(Comments CompanyName
    FileDescription FileVersion InternalName LegalCopyright
    LegalTrademarks OriginalFilename ProductName ProductVersion);

当您处理通用 UTF16-BE 数据时，请使用Encode库：

use Encode qw(decode encode);
my $octets = # extracted from the exe
    "\x00\x73\x00\x6f\x00\x66\x00\x74\x00\x20\x00\x43\x00\x6f" .
    "\x00\x70\x00\x6f\x00\x72\x00\x61\x00\x74\x00\x69\x00\x6f";
my $characters = decode 'UTF16-BE', $octets, Encode::FB_CROAK;
# 'soft Coporatio'

python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？

2 回答 2

Related

Reference