python - 当我受到磁盘 i/o 限制时进行诊断

Question

我在 Linux 机器上运行 Python 2.7，到目前为止，我的脚本中最慢的部分是使用ujson库从磁盘（SSD）加载一个大型 json 文件。当我top在这个加载过程中检查时，我的 cpu 使用率基本上是 100%，这让我相信我通过解析 json 而不是通过将字节从磁盘传输到内存而遇到了瓶颈。这是一个有效的假设，还是 ujson 在等待磁盘时会烧掉空循环或其他东西？我有兴趣知道，因为我不确定将我的 cpu 的另一个核心用于另一个执行大量磁盘 i/o 的脚本是否会显着减慢第一个脚本。

score 1 · Accepted Answer

没有看到你的代码，我会假设你正在做这样的事情：

with open('data.json') as datafile:
    data = json.loads(datafile.read())

相反，您可以拆分读取文件和解析它的步骤：

with open('data.json') as datafile:
    raw_data = datafile.read()
    data = json.loads(raw_data)

如果您添加一些计时调用，您可以确定每个步骤需要多长时间：

# Timing decorator from https://www.andreas-jung.com/contents/a-python-decorator-for-measuring-the-execution-time-of-methods
import time                                                

def timeit(method):

    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()

        print '%r (%r, %r) %2.2f sec' % \
              (method.__name__, args, kw, te-ts)
        return result

    return timed

with open('data.json') as datafile:
    @timeit
    raw_data = datafile.read()
    @timeit
    data = json.loads(raw_data)

python - 当我受到磁盘 i/o 限制时进行诊断

1 回答 1

Related

Reference