ruby - 如何按制表符分隔文件中的特定列对项目进行分组

Question

我在制表符分隔的文本文件中有以下记录：

sku title   Product Type                        
19686940    This is test Title1 toys                        
19686941    This is test Title2 toys                        
19686942    This is test Title3 toys                        
20519300    This is test Title1 toys2                       
20519301    This is test Title2 toys2
20580987    This is test Title1 toys3                       
20580988    This is test Title2 toys3                       
20582176    This is test Title1 toys4

如何对项目进行分组Product Type并找到所有唯一的单词title？

输出格式：

Product Type   Unique_words 
------------   ------------ 
toys           This is test Title1 Title2 Title3
toys2          This is test Title1 Title2
toys3          This is test Title1 Title2
toys4          This is test Title1

更新
到现在我已经完成了代码，直到读取文件并存储到数组中：

class Product
    attr_reader :sku, :title, :productType
    def initialize(sku,title,productType)
      @sku = sku
      @title = title
      @productType = productType
    end

    def sku
      @sku
    end

    def title
      @title
    end

    def productType
      @productType
    end
end

class FileReader
  def ReadFile(m_FilePath)
    array = Array.new
    lines = IO.readlines(m_FilePath)

    lines.each_with_index do |line, i|
      current_row = line.split("\t")
      product = Product.new(current_row[0],current_row[1],current_row[2])

      array.push product
    end
  end
end

filereader_method = FileReader.new.method("ReadFile")
Reading =  filereader_method.to_proc

puts Reading.call("Input.txt")

score 0 · Accepted Answer

要获得分组，您可以使用Enumerable#group_by：

Product = Struct.new(:sku, :title, :product_type)

def products_by_type(file_path)
  File.open(file_path)
      .map{ |line| Product.new(*line.chomp.split("\t")) }
      .group_by{ |product| product.product_type }
end

Ruby 的美妙之处在于您有很多选择。您还可以查看CSV库和OpenStruct，因为这只是一个数据对象：

require 'csv'
require 'ostruct'

def products_by_type(file_path)
  csv_opts = { col_sep: "\t",
               headers: true,
               header_converters: [:downcase, :symbol] }

  CSV.open(file_path, csv_opts)
     .map{ |row| OpenStruct.new row.to_hash }
     .group_by{ |product| product.product_type }
end

或者使用基于哈希键的创建习惯来删除上面的#to_hash调用row：

class Product
  attr_accessor :sku, :title, :product_type

  def initialize(data)
    data.each{ |key, value| self.key = value }
  end
end

def products_by_type(file_path)
  csv_opts = { #... }

  CSV.open(file_path, csv_opts)
     .map{ |row| Product.new row }
     .group_by{ |product| product.product_type }
end

然后根据哈希值，根据需要格式化输出：

def unique_title_words(*products)
  products.flat_map{ |product| product.title.scan(/\w+/) }
          .unique
end

puts "Product Type\tUnique Words"
products_by_type("./file.txt").each do |type, products|
  puts "#{type}\t#{unique_title_words products}"
end

ruby - 如何按制表符分隔文件中的特定列对项目进行分组

1 回答 1

Related

Reference