ruby - 如何使用 Net::HTTP 仅读取正文的 x 个字节数？

Question

在阅读网页正文时，似乎 Ruby 的 Net::HTTP 方法要么全有，要么全无。例如，我如何读取正文的前 100 个字节？

如果请求的文件不可用，我正在尝试从内容服务器读取响应正文中的简短错误消息。我需要阅读足够多的正文以确定文件是否存在。文件很大，所以我不想为了检查文件是否可用而获取整个文件。

score 13 · Accepted Answer

This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:

require 'net/http'

# provide access to the actual socket
class Net::HTTPResponse
  attr_reader :socket
end

uri = URI("http://www.example.com/path/to/file")
begin
  Net::HTTP.start(uri.host, uri.port) do |http|
    request = Net::HTTP::Get.new(uri.request_uri)
    # calling request with a block prevents body from being read
    http.request(request) do |response|
      # do whatever limited reading you want to do with the socket
      x = response.socket.read(100);
      # be sure to call finish before exiting the block
      http.finish
    end
  end
rescue IOError
  # ignore
end

The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.

FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:

class Net::BufferedIO
  def readchar
    read(1)[0].ord
  end
end

score 12 · Accepted Answer

您不应该只使用 HTTPHEAD请求（RubyNet::HTTP::Head方法）来查看资源是否存在，并且只有在收到 2xx 或 3xx 响应时才继续？这假定您的服务器配置为在文档不可用时返回 4xx 错误代码。我认为这是正确的解决方案。

另一种方法是请求 HTTP 头并查看content-length结果中的头值：如果您的服务器配置正确，您应该能够轻松分辨短消息和长文档之间的长度差异。另一种选择：content-range在请求中设置标头字段（再次假定服务器的行为正确 WRT HTTP 规范）。

我不认为在您发送 GET 请求后在客户端解决问题是可行的方法：到那时，网络已经完成了繁重的工作，您不会真正节省任何浪费的资源。

参考：http头定义

score 3 · Accepted Answer

我想这样做一次，我唯一能想到的就是猴子修补Net::HTTP#read_bodyandNet::HTTP#read_body_0方法以接受长度参数，然后在前者中只需将长度参数传递给read_body_0方法，在那里你只能读取长度字节。

score 2 · Accepted Answer

您确定内容服务器只返回一个简短的错误页面吗？

它是否也将设置HTTPResponse为适当的值，例如 404。在这种情况下，您可以捕获在访问时引发的HTTPClientError派生异常（最有可能）。HTTPNotFoundNet::HTTP.value()

如果您收到错误，那么您的文件不存在，如果您收到 200，则文件开始下载，您可以关闭连接。

score 2 · Accepted Answer

要分块读取 HTTP 请求的正文，您需要Net::HTTPResponse#read_body像这样使用：

http.request_get('/large_resource') do |response|
  response.read_body do |segment|
    print segment
  end
end

score -4 · Accepted Answer

你不能。但你为什么需要？当然，如果页面只是说文件不可用，那么它不会是一个巨大的页面（即根据定义，文件不会存在）？

ruby - 如何使用 Net::HTTP 仅读取正文的 x 个字节数？

6 回答 6

Related

Reference