0

I want to download and process csv file that is on sftp server line by line. If I am using download! or sftp.file.open, it is buffering whole data in memory that I want to avoid.

Here is my source code:

sftp = Net::SFTP.start(@sftp_details['server_ip'], @sftp_details['server_username'], :password => decoded_pswd)
  if sftp
    begin
      sftp.dir.foreach(@sftp_details['server_folder_path']) do |entry|
        print_memory_usage do
          print_time_spent do
            if entry.file? && entry.name.end_with?("csv")
              batch_size_cnt = 0
              sftp.file.open("#{@sftp_details['server_folder_path']}/#{entry.name}") do |file|
                header = file.gets
                header = header.force_encoding(header.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
                csv_data = ''
                while line = file.gets
                  batch_size_cnt += 1
                  csv_data.concat(line.force_encoding(line.encoding).encode('UTF-8', invalid: :replace, undef: :replace, replace: ''))
                  if batch_size_cnt == 1000 || file.eof?
                    CSV.parse(csv_data, {headers: header, write_headers: true}) do |row|
                      row.delete(nil) 
                      entities << row.to_hash       
                    end
                    csv_data, batch_size_cnt = '', 0
                    courses.delete_if(&:blank?)
                    # DO PROCESSING PART
                    entities = []
                  end
                end if header
              end
              sftp.rename("#{@sftp_details['server_folder_path']}/#{entry.name}", "#{@sftp_details['processed_file_path']}/#{entry.name}")
            end
          end
        end
end

Can someone please help? Thanks

4

1 回答 1

1

您需要添加某种缓冲区才能读取块,然后将它们全部写入。我认为拆分脚本解析和下载是明智的。当时只关注一件事:

您的原始行:

   ...
   sftp.file.open("#{@sftp_details['server_folder_path']}/#{entry.name}") do |file|
   ...

如果您检查(不要忘记砰!)方法的源文件,您可以使用'stringio'。download!您可以轻松调整的存根。通常默认缓冲区(32kB)就足够了。您可以根据需要更改它(参见示例)。

替换为(仅适用于单个文件):

StringIO用法:

   ...
  io = StringIO.new
  sftp.download!("#{@sftp_details['server_folder_path']}/#{entry.name}", io.puts, :read_size => 16000))

或者你可以只下载一个文件

  ...
  file = File.open("/your_local_path/#{entry.name}",'wb')
  sftp.download!("#{@sftp_details['server_folder_path']}/#{entry.name}", file, :read_size => 16000)
  ....

从文档中,您可以使用一个选项:read_size

:read_size - 一次从源读取的最大字节数。增加此值可能会提高吞吐量。它默认为 32,000 字节。

于 2018-07-09T11:37:22.963 回答