0

所以我有一个包含两列的 CSV,其中包含字符串格式的美元金额。head -n 5 file.csv揭示以下内容:

Title,Distributor Long Name,Wk,Estimated Weekend Gross,Cume,Locs Reported,Avg/Loc,Booking Title #
"=""Zero Dark Thirty""","=""Sony""",4,"24,000,000","29,480,807",2937,"8,172","=""66273"""
"=""Haunted House, A""","=""Open Road""",1,"18,817,000","18,817,000",2160,"8,712","=""71209"""
"=""Gangster Squad""","=""Warner Bros.""",1,"16,710,000","16,710,000",3103,"5,385","=""66556"""
"=""Django Unchained""","=""The Weinstein Company""",3,"11,065,000","125,399,122",3012,"3,674","=""66122"""

这持续了大约 40 行。您会注意到其中两列——“Estimated Weekend Gross”和“Cume”——将它们的值作为字符串

所以我的问题是,有没有办法只遍历这两列,将字符串值转换为整数row.to_s.gsub(',','').to_i,然后将这些值覆盖到同一个CSV 中的相应行?

我尝试做这样的事情,但我没有得到格式正确的 CSV ..

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    csv << row[0].to_s.gsub('=','').gsub(', The','')
    csv << row[3].to_s.gsub(',','').to_i
    csv << row[4].to_s.gsub(',','').to_i
  end
end

在做块时我也玩过:headers => :integer,但它不会让我将值从字符串转换为整数。那么,我错过了什么?我应该存储这些值然后编写一个新的 CSV 还是有更简单的方法?

4

3 回答 3

3

亚伦,只需更改行并将其写入您的新文件,如下所示

require 'csv'

File.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end

编辑:如果您想将标题保存在 modified.csv 中,您可以这样做,但是如果有人对此有更好的解决方案,则必须有更短的方法而无需两次打开文件?

headers = CSV.open('original.csv', 'r', :headers => true).read.headers
CSV.open('modified.csv', 'w') do |csv|
  csv << headers
  CSV.foreach('original.csv', :headers => true) do |row|
    row['Estimated Weekend Gross'] = row['Estimated Weekend Gross'].delete(',').to_i
    row['Cume'] = row['Cume'].delete(',').to_i
    csv << row
  end
end
于 2013-01-14T10:23:28.203 回答
0

您可以使用以下方法获取它:

sed 's/,\("[^"]*"\)*/|\1/g' file.csv | awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$i;}}}print s;}' -

我得到了这个输出:

"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""

我知道这很难理解,所以我将逐步解释:

  1. 首先,考虑到引号,为每个字段创建一个分隔符:

    sed 's/,("[^"] ") /|\1/g' file.csv

你会得到一个管道分隔符“|” 每个字段之间:

"=""Zero Dark Thirty"""|""|4|"24,000,000"|"29,480,807"|2937|"8,172"|""
"=""Haunted House| A"""|""|1|"18,817,000"|"18,817,000"|2160|"8,712"|""
"=""Gangster Squad"""|""|1|"16,710,000"|"16,710,000"|3103|"5,385"|""
"=""Django Unchained"""|""|3|"11,065,000"|"125,399,122"|3012|"3,674"|""
  1. 使用管道作为字段分隔符获得此输出后,您可以使用 awk 将描述的过滤器应用于字段 4 和 5(它应该在 sed 命令之后运行,因为它将 sed 的输出作为输入):

    awk -F"|" '{s="";for (i=1; i<=NF; i++){if (i==4 || i==5){gsub("\,","",$i);gsub ("\"","",$i);s=s","$i;}else{if (i>1){s=s","$i;}else{s=s""$ i;}}}print s;}' -

删除每个字段的引号和逗号(作为整数表示),并获得所需的输出:

"=""Zero Dark Thirty""","",4,24000000,29480807,2937,"8,172",""
"=""Haunted House, A""","",1,18817000,"18,817,000",2160,"8,712",""
"=""Gangster Squad""","",1,16710000,16710000,3103,"5,385",""
"=""Django Unchained""","",3,11065000,125399122,3012,"3,674",""
于 2013-01-14T09:32:49.150 回答
0

你可以试试这个:

CSV.open('modified.csv', 'w') do |csv|
  CSV.foreach('original.csv') do |row|
    modified_row = row.clone
    modified_row[0] = row[0].to_s.gsub('=','').gsub(', The','')
    modified_row[3] = row[3].to_s.gsub(',','').to_i
    modified_row[4] = row[4].to_s.gsub(',','').to_i
    csv << modified_row
  end
end

我更改了用于写入的文件打开以使用 CSV,然后更正了追加以追加行行的数组,而不是追加单个值。

于 2013-01-14T10:13:31.307 回答