python - 如何提高读取大文件并将其作为下载返回的python cgi的性能？

Question

我有这个 python cgi 脚本，它检查它是否没有从同一个 IP 多次访问，如果一切正常，从磁盘（11MB）读取一个大文件，然后将其作为下载返回。

它有效，但性能很糟糕。瓶颈似乎是一遍又一遍地读取这个巨大的文件：

def download_demo():
    """
    Returns the demo file
    """

    file = open(FILENAME, 'r')
    buff = file.read()

    print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n%s" %    (os.path.split(FILENAME)[-1], len(buff), buff)

我怎样才能让它更快？我想过使用 ram 磁盘来保存文件，但必须有一些更好的解决方案。使用mod_wsgi而不是 cgi 脚本会有所帮助吗？我可以将大文件保存在 apache 的内存空间中吗？

任何帮助是极大的赞赏。

score 9 · Accepted Answer

使用 mod_wsgi 并使用类似于：

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

换句话说，使用 WSGI 标准的 wsgi.file_wrapper 扩展来允许 Apache/mod_wsgi 使用 sendfile/mmap 对文件内容进行优化回复。换句话说，避免您的应用程序甚至需要将文件读入内存。

score 2 · Accepted Answer

为什么要打印全部在一个打印语句中？Python 必须生成几个临时字符串来处理内容标题，并且由于最后一个 %s，它必须将文件的全部内容保存在两个不同的字符串变量中。这应该更好。

print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(FILENAME)[-1], len(buff))
print buff

您也可以考虑使用原始 IO 模块读取文件，这样 Python 就不会创建您不使用的临时缓冲区。

score 1 · Accepted Answer

尝试一次读取和输出（即缓冲）一个 16KB 的块。可能 Python 在幕后做的很慢，而手动缓冲可能会更快。

您不必使用例如 ramdisk - OS 磁盘缓存应该为您缓存文件内容。

score 1 · Accepted Answer

mod_wsgi 或 FastCGI 会有所帮助，因为您无需在每次运行脚本时都重新加载 Python 解释器。但是，它们对提高读取文件的性能几乎没有什么作用（如果这确实是您的瓶颈的话）。我建议您改用类似于 memcached 的东西。

python - 如何提高读取大文件并将其作为下载返回的python cgi的性能？

4 回答 4

Related

Reference