multithreading - 用整数数学划分文件

Question

我正在从 n 个服务器读取文件，并且我希望每个服务器下载文件的 1/n。我认为一些快速整数数学会起作用，但它似乎并不总是有效：

threads = n
thread_id = 0:n-1
filesize (in bytes) = x

starting position = thread_id*(filesize/threads)
bytes to read = (filesize/threads)

有时对于正确的数字，比如一个 26 字节的文件被 9 个线程划分（我知道这很荒谬，但只是举例），它对我不利。一定会有更好的办法。有任何想法吗？

score 1 · Accepted Answer

在我看来，唯一缺少的是最后一个线程（线程n-1）必须读取到文件末尾才能获取“模”字节——除以threads. 基本上：

bytes_to_read = (thread_id == n - 1) ? filesize / threads + filesize % threads
                                     : filesize / threads

或者，您可以通过将每个线程的 1 个字节添加到 bytes_to_read 来将这些额外的工作拆分到第一个filesize % threads线程上 - 当然您必须调整起始位置。

score 0 · Accepted Answer

您必须执行以下操作：

starting position = thread_id * floor(filesize / threads)
bytes to read = floor(filesize / threads) if thread_id != threads-1
bytes to read = filesize - (threads-1)*floor(filesize / threads) if thread_id = threads - 1

score 0 · Accepted Answer

要准确读取每个字节一次，请一致地计算开始和结束位置，然后减去以获得字节数：

start_position = thread_id * file_size / n
end_position = (thread_id + 1) * file_size / n
bytes_to_read = end_position - start_position

请注意，位置表达式是经过精心挑选的，以提供end_position == file_sizewhen thread_id == n-1。如果您执行其他操作，例如thread_id * (file_size/n)，您需要将其视为特殊情况，就像@wuputah 所说的那样。

multithreading - 用整数数学划分文件

3 回答 3

Related

Reference