6

这是我的第一个问题,如果代码示例方面有点长,我深表歉意。

作为工作申请的一部分,我被要求编写一个 Bit Torrent 文件解析器来公开一些字段。我编写了代码,并被告知我的代码“还没有达到我们对团队负责人要求的水平”。哎哟!

很好,自从我编码以来已经有好几年了,并且列表推导、生成器在当时并不存在(我从 COBOL 开始,但是用 C、C++ 等进行了编码)。对我来说,下面的代码非常干净。有时不需要使用更复杂的结构、语法或模式——“保持简单”。

我可以请一些 Python 大师来批评这段代码吗?我相信对其他人了解代码可以改进的地方很有用。还有更多评论等(bencode.py 来自http://wiki.theory.org/Decoding_bencoded_data_with_python

我能想到的领域:

  1. 在 display_* 方法中使用列表推导来避免字符串“if's”更好
  2. 列表理解/生成器使用
  3. 全局变量使用不当
  4. 标准输入/标准输出/管道?这是一个简单的任务,所以我认为没有必要。

我个人为这段代码感到自豪,所以想知道我需要改进的地方。谢谢。

    #!/usr/bin/env python2
    """Bit Torrent Parsing

    Parses a Bit Torrent file.


    A basic parser for Bit Torrent files.  Visit http://wiki.theory.org/BitTorrentSpecification for the BitTorrent specification.

    """

    __author__ = "...."
    __version__ = "$Revision: 1.0 $"
    __date__ = "$Date: 2012/10/26 11:08:46 $"
    __copyright__ = "Enjoy & Distribute"
    __license__ = "Python"

    import bencode
    import argparse
    from argparse import RawTextHelpFormatter
    import binascii
    import time
    import os
    import pprint

    torrent_files = 0
    torrent_pieces = 0

    def display_root(filename, root):
        """prints main (root) information on torrent"""
        global torrent_files
        global torrent_pieces

        print
        print "Bit Torrent Metafile Structure root nodes:"
        print "------------------------------------------"
        print "Torrent filename: ", filename
        print "    Info:           %d file(s), %d pieces, ~%d kb/pc" % (
                   torrent_files, 
                   torrent_pieces, 
                   root['info']['piece length'] / 1024)

        if 'private' in root['info']:
            if root['info']['private'] == 1:
                print "                      Publish presence: Private"

        print "    Announce:      ", root['announce']

        if 'announce-list' in root:
            print "    Announce List: "
            for i in root['announce-list']:
                print "                   ", i[0]  

        if 'creation date' in root:
            print "    Creation Date: ", time.ctime(root['creation date'])

        if 'comment' in root:
            print "    Comment:       ", root['comment']

        if 'created-by' in root:
            print "    Created-By:    ", root['created-by']

        print "    Encoding:      ", root['encoding']
        print



    def display_torrent_file(info):
        """prints file information (single or multifile)"""
        global torrent_files
        global torrent_pieces

        if 'files' in info:
            # multipart file mode
            # directory, followed by filenames

            print "Files:"

            max_files = args.maxfiles 
            display = max_files if (max_files < torrent_files) else torrent_files
            print "    %d File %d shown: " % (torrent_files, display)
            print "    Directory: ", info['name']
            print "    Filenames:"

            i = 0
            for files in info['files']:

                if i < max_files:

                    prefix = ''
                    if len(files['path']) > 1:
                        prefix = './'
                    filename = prefix + '/'.join(files['path'])

                    if args.filehash: 
                        if 'md5sum' in files:
                            md5hash = binascii.hexlify(files['md5sum'])
                        else:
                            md5hash = 'n/a'
                        print '        %s [hash: %s]' % (filename, md5hash)
                    else:
                        print '        %s ' % filename
                    i += 1
                else:
                    break

        else:
            # single file mode
            print "Filename: ", info['name']

        print


    def display_pieces(pieceDict):
        """prints SHA1 hash for pieces, limited by arg pieces"""
        global torrent_files
        global torrent_pieces 
      #  global pieceDict  

        # limit since a torrent file can have 1,000's of pieces
        max_pieces = args.pieces if args.pieces else 10

        print "Pieces:" 
        print "    Torrent contains %s pieces, %d shown."% (
              torrent_pieces, max_pieces)

        print "    piece : sha1"
        i = 0           
        while i < max_pieces and i < torrent_pieces:
            # print SHA1 hash in readable hex format
            print '    %5d : %s' % (i+1, binascii.hexlify(pieceDict[i]))
            i += 1


    def parse_pieces(root):
        """create dictionary [ piece-num, hash ] from info's pieces  

        Returns the pieces dictionary.  key is the piece number, value is the
        SHA1 hash value (20-bytes) 

        Keyword arguments:
        root -- a Bit Torrent Metafile root dictionary

        """

        global torrent_pieces 

        pieceDict = {}
        i = 0
        while i < torrent_pieces:
            pieceDict[i] = root['info']['pieces'][(20*i):(20*i)+20]
            i += 1   

        return pieceDict

    def parse_root_str(root_str):
        """create dictionary [ piece-num, hash ] from info's pieces  

        Returns the complete Bit Torrent Metafile Structure dictionary with 
        relevant Bit Torrent Metafile nodes and their values.

        Keyword arguments:
        root_str -- a UTF-8 encoded string with root-level nodes (e.g., info)

        """

        global torrent_files
        global torrent_pieces


        try:
            torrent_root = bencode.decode(root_str)
        except StandardError:
            print 'Error in torrent file, likely missing separators like ":"'

        if 'files' in torrent_root['info']:
            torrent_files = len(torrent_root['info']['files'])
        else:
            torrent_files = 1

        torrent_pieces = len(torrent_root['info']['pieces']) / 20
        torrent_piece = parse_pieces(torrent_root)

        return torrent_root, torrent_piece   

    def readfile(filename):
        """read  file and return file's data"""

        global torrent_files
        global torrent_pieces

        if os.path.exists(filename):  
            with open(filename, mode='rb') as f:
               filedata = f.read()
        else:
            print "Error: filename: '%s' does not exist." % filename
            raise IOError("Filename not found.")

        return filedata


    if __name__ == "__main__":

        parser = argparse.ArgumentParser(formatter_class=RawTextHelpFormatter,
            description=
                  "A basic parser for Bit Torrent files.  Visit "
                  "http://wiki.theory.org/BitTorrentSpecification for "
                  "the BitTorrent specification.",
            epilog=
                  "The keys for the Bit Torrent MetaInfo File Structure "
                  "are info, announce, announce-list, creation date, comment, "
                  "created by and encoding. \n"
                  "The Info Dictionary (info) is dependant on whether the torrent "
                  "file is a single or multiple file.  The keys common to both "
                  "are piece length, pieces and private.\nFor single files, the "
                  "additional keys are name, length and md5sum.For multiple files " 
                  "the keys are, name and files.  files is also a dictionary with " 
                  "keys length, md5sum and path.\n\n"
                  "Examples:\n"
                  "torrentparse.py --string 'l4:dir14:dir28:file.exte'\n"
                  "torrentparse.py --filename foo.torrent\n"
                  "torrentparse.py -f foo.torrent -f bar.torrent "
                  "--maxfiles 2 --filehash  --pieces 2  -v")           
        filegroup = parser.add_argument_group('Input File or String')
        filegroup.add_argument("-f", "--filename", 
                   help="name of torrent file to parse",
                   action='append')
        filegroup.add_argument("-fh", "--filehash", 
                   help="display file's MD5 hash",
                   action = "store_true") 
        filegroup.add_argument("-maxf", "--maxfiles",
                   help="display X filenames (default=20)",
                   metavar = 'X',
                   type=int, default=20) 

        piecegroup = parser.add_argument_group('Torrent Pieces')
        piecegroup.add_argument("-p", "--pieces", 
                   help = "display X piece's SHA1 hash (default=10)", 
                   metavar = 'X',
                   type = int)

        parser.add_argument("-s", "--string",
                   help="string for bencoded dictionary item")


        parser.add_argument("-v", "--verbose", 
                   help = "Display MetaInfo file to stdout",
                   action = "store_true")

        args = parser.parse_args()



        if args.string:
            print 
            text = bencode.decode(args.string)
            print text
        else:    
            for fn in args.filename:

                try:
                    filedata = readfile(fn)
                    torrent_root, torrent_piece = parse_root_str(filedata)
                except IOError: 
                    print "Please enter a valid filename"
                    raise

                if torrent_root:
                    display_root(fn, torrent_root)
                    display_torrent_file(torrent_root['info'])

                    if args.pieces:
                        display_pieces(torrent_piece)

                verbose = True if args.verbose else False
                if verbose:
                    print
                    print "Verbose Mode: \nPrinting root and info dictionaries"
                    # remove pieces as its long. display it afterwards
                    pieceless_root = torrent_root
                    del pieceless_root['info']['pieces']
                    pp = pprint.PrettyPrinter(indent=4)
                    pp.pprint(pieceless_root)
                    print

                    print "Print info's piece information: "
                    pp.pprint(torrent_piece)
                    print
                print "\n"
4

2 回答 2

4

以下片段:

i = 0
while i < torrent_pieces:
    pieceDict[i] = root['info']['pieces'][(20*i):(20*i)+20]
    i += 1

应替换为:

for i in range(torrent_pieces):
    pieceDict[i] = root['info']['pieces'][(20*i):(20*i)+20]

这可能是他们想看到的那种东西。一般来说,Python 代码不应该非常需要在for循环中进行显式的索引变量操作。

于 2012-11-10T03:22:20.313 回答
3

我注意到的第一件事是你有很多全局变量。这不好;对于一个问题,您的代码不再是线程安全的。(我现在看到您在问题中指出了这一点,但这是应该改变的。)

这看起来有点奇怪:

i = 0
for files in info['files']:
    if i < max_files:
        # ...
    else:
        break

相反,您可以这样做:

for file in info['files'][:max_files]:
    # ...

我还注意到您对文件的解析刚好足以输出所有漂亮打印的数据。您可能希望将其放入适当的结构中。例如,有Torrent、、PieceFile类。

于 2012-11-10T03:21:34.173 回答