0

我想对以下形式的一些制表符分隔数据进行排序。

Marketing, Advertising, PR  Graduate, Trainees  Oil, Gas, Alternative Energy    
Marketing, Advertising, PR  Graduate, Trainees  Public Sector & Services    
Marketing, Advertising, PR  Graduate, Trainees  Recruitment Sales   
Marketing, Advertising, PR  Graduate, Trainees  Secretarial, PAs, Administration    
Marketing, Advertising, PR  Graduate, Trainees  Senior Appointments 
Marketing, Advertising, PR  Graduate, Trainees  Telecommunications  
Marketing, Advertising, PR  Graduate, Trainees  Transport, Logistics    
Other   Graduate, Trainees  Banking, Insurance, Finance 
Other   Graduate, Trainees  Customer Services   
Other   Graduate, Trainees  Education   
Other   Graduate, Trainees  Health, Nursing 
Other   Graduate, Trainees  Legal   
Other   Graduate, Trainees  Management Consultancy

有单短语词和多词短语的混合。短语的单词之间有逗号。短语是制表符分隔的。

我需要将它与另一组数据进行比较,其中文本单元格已按字母顺序进行了有益的排序。

显然,这使得直接比较变得困难(不可能)。

按照下面 ovastus 的建议,我有以下代码

open System;;
open System.IO;;
#load @"BigDataModule.fs";;
open BigDataModule;;

let sample = "TruncatedData.txt";;

let outputFile = "SortedOutput.csv";;


let sortWithinRow (row:string) =
    let columns = row.Split([|'\t'|])
    let sortedColumns = 
        Seq.append
            (columns |> Seq.take (columns.Length) |> Seq.sort)
            [ columns.[columns.Length - 1] ]            
    sortedColumns |> String.concat ",";;

sample |> readLines |>  Seq.map sortWithinRow |> saveTo (outputFile);;

其中 readLines 和 saveTo 是我自己的大数据模块中用于读取文件和保存输出的函数。

当我从这个脚本得到输出时,不幸的是排序没有产生预期的结果,并且行仍然没有按字母顺序排序。

如果有人可以帮助我进一步完善我的脚本,我将不胜感激。

我很抱歉浪费时间,因为最初通过过度简化输入格式而未能确定问题。

编辑 1:澄清我已将数据保存为 csv 文件,并将在 F# 中执行此操作。

编辑 2:我已经摆脱了数据集的所有无关部分,我只需要在这些行中进行排序。我还提供了我尝试过的一些代码的更多细节。

编辑 3:

这是我输入的原始数据框,过于简单化了

Alpha   Bravo   Tango   Delta   15.00
Bravo   Delta   Tango       20.30
Delta   Alpha   Tango   6.17   
Charlie Tango   Foxtrot Alpha   19.13
4

3 回答 3

1

下面的呢?

sample |> 
  Seq.map (fun x -> x.Split('\t')) |> 
  Seq.map (Seq.map (fun x -> x.Trim())) |> 
  Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
  Seq.map Seq.sort |> 
  Seq.map (String.concat '\t') |> 
  String.concat '\n';;

我无法以粘贴示例的方式键入 \t,因此对于可执行示例,我必须将字段分隔符切换为空格

open System

let sample2 = """Alpha  Bravo Tango Delta    15.00
Bravo  Delta Tango          20.30
Delta  Alpha Tango          6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]

sample2 |> 
  Seq.map (fun x -> x.Split([|"  "|], StringSplitOptions.None)) |> 
  Seq.map (Seq.map (fun x -> x.Trim())) |> 
  Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
  Seq.map Seq.sort |> 
  Seq.map (String.concat '\t') |> 
  String.concat '\n';;
于 2013-03-01T11:24:54.730 回答
1

我不确定我是否正确理解了您想要的内容,但是如果您想生成此输出:

 Alpha Bravo Delta Tango 15.00
 Bravo Delta Tango 20.30
 Alpha Delta Tango 6.17
 Alpha Charlie Foxtrot Tango 19.13

你可以这样做:

open System

let sample = """Alpha  Bravo Tango Delta    15.00
Bravo  Delta Tango          20.30
Delta  Alpha Tango          6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]

let sortWithinRow (row:string) =
    let columns = row.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
    let sortedColumns = 
        Seq.append
            (columns |> Seq.take (columns.Length - 1) |> Seq.sort)
            [ columns.[columns.Length - 1] ]            
    sortedColumns |> String.concat " "

sample |> Seq.map sortWithinRow |> String.concat "\n"
于 2013-02-28T13:52:42.710 回答
0

尝试使用F# 数据

[<Literal>]
let sample = """Text1,Text2,Text3,Text4,ValueField
Alpha,Bravo,Tango,Delta,15.00
Bravo,Delta,Tango,,20.30
Delta,Alpha,Tango,,6.17
Charlie,Tango,Foxtrot,Alpha,19.13"""

open FSharp.Data

let csv = CsvProvider<sample, Separator = ",">.Load("input.csv")

let sortedData = 
    csv.Data 
    |> Seq.sortBy (fun row -> row.Text1)
    |> Seq.map (fun row -> row.Columns |> String.concat ",")

System.IO.File.WriteAllLines("output.csv", sortedData)

如果你想按多个字段排序,你可以在排序函数中对它们进行元组:

|> Seq.sortBy (fun row -> row.Text1, row.Text3)
于 2013-02-28T13:31:50.553 回答