0

给定一个逗号分隔的 CSV 文件,格式如下:

Day,User,Requests,Page Views,Browse Time,Total Bytes,Bytes Received,Bytes Sent
"Jul 25, 2012","abc123",3,0,0,13855,3287,10568
"Jul 25, 2012","abc230",1,0,0,1192,331,861
"Jul 25, 2012",,7,0,0,10990,2288,8702
"Jul 24, 2012","123456",3,0,0,3530,770,2760
"Jul 24, 2012","abc123",19,1,30,85879,67791,18088

我想将整个数据集(30 天内 1000 个用户 = 30,000 条记录)放入一个散列中,这样键 1 可能是重复键,键 2 可能是重复键,但键 1 和 2 将是唯一的。

使用上面第 1 行的示例:

report_hash = "2012 年 7 月 25 日" => "abc123" => {"PageRequest" => 3, "PageViews" => 0, "BrowseTime" => 0, "TotalBytes" => 13855, "BytesReceived" => 3287 , "字节发送" => 10568}

def hashing(file)
  #read the CSV file into an Array
  report_arr = CSV.read(file)
  #drop the header row
  report_arr.drop(1)
  #Create an empty hash to save the data to
  report_hash = {}
  #for each row in the array,
  #if the first element in the array is not a key in the hash, make one
  report_arr.each{|row|
    if report_hash[row[0]].nil?
      report_hash[row[0]] = Hash.new
    #If the key exists, does the 2nd key exist?  if not, make one
    elsif report_hash[row[0]][row[1]].nil?
      report_hash[row[0]][row[1]] = Hash.new
    end
    #throw all the other data into the 2-key hash
    report_hash[row[0]][row[1]] = {"PageRequest" => row[2].to_i, "PageViews" => row[3].to_i, "BrowseTime" => row[4].to_i, "TotalBytes" => row[5].to_i, "BytesReceived" => row[6].to_i, "BytesSent" => row[7].to_i}
  }
  return report_hash
end

我花了几个小时学习散列和相关内容以达到这一点,但感觉有一种更有效的方法可以做到这一点。关于创建嵌套哈希的正确/更有效方法的任何建议,其中前两个键是数组的前两个元素,以便它们创建“复合”唯一键?

4

1 回答 1

2

您可以将数组[day, user]用作哈希键。

report_hash = {
  ["Jul 25, 2012","abc123"] =>
    {
      "PageRequest" => 3,
      "PageViews" => 0,
      "BrowseTime" => 0,
      "TotalBytes" => 13855,
      "BytesReceived" => 3287,
      "BytesSent" => 10568
    }
}

您只需确保日期和用户始终显示相同。如果您的日期(例如)有时以不同的格式出现,您必须在使用它来读取或写入哈希之前对其进行规范化。

类似的方法是将 day + user 转换为字符串,在它们之间使用一些分隔符。但是您必须更加小心,分隔符不会出现在当天或用户中。

编辑:

还要确保不要修改散列键。使用数组作为键使得这很容易犯错误。如果您真的想要,您可以使用 修改副本dup,如下所示:

new_key = report_hash.keys.first.dup
new_key[1] = 'another_user'
于 2012-07-31T20:52:54.840 回答