我有一个四列 CSV 文件,@
用作分隔符,例如:
0001 @ fish @ animal @ eats worms
第一列是唯一保证唯一的列。
我需要对第 2、3 和 4 列执行四次排序操作。
首先,第 2 列按字母数字排序。这种排序的重要特征是它必须保证第 2 列中的任何重复条目彼此相邻,例如:
@ a @ @
@ a @ @
@ a @ @
@ a @ @
@ a @ @
@ b @ @
@ b @ @
@ c @ @
@ c @ @
@ c @ @
@ c @ @
@ c @ @
接下来,在第一个排序中,将行分为两类。第一行是那些不包含单词“arch.”、“var.”、“ver.”、“anci”的行。或“家庭”。第 4 列中的任何位置。第二行(排在后面)是包含这些单词的行,例如:
@ a @ @ Does not have one of those words.
@ a @ @ Does not have one of those words.
@ a @ @ Does not have one of those words.
@ a @ @ Does not have one of those words.
@ a @ @ This sentence contains arch.
@ b @ @ Does not have one of those words.
@ b @ @ Has the word ver.
@ c @ @ Does not have one of those words.
@ c @ @ Does not have one of those words.
@ c @ @ Does not have one of those words.
@ c @ @ This sentence contains var.
@ c @ @ This sentence contains fam.
@ c @ @ This sentence contains fam.
最后,仅在第二次排序的单独类别中排序,将行从“包含第 3 列中重复条目最多”到“包含第 3 列中重复条目最少”,例如:
@ a @ fish @ Does not have one of those words.
@ a @ fish @ Does not have one of those words.
@ a @ fish @ Does not have one of those words.
@ a @ tiger @ Does not have one of those words.
@ a @ bear @ This sentence contains arch.
@ b @ fish @ Does not have one of those words.
@ b @ fish @ Has the word ver.
@ c @ bear @ Does not have one of those words.
@ c @ bear @ Does not have one of those words.
@ c @ fish @ Does not have one of those words.
@ c @ tiger @ This sentence contains var.
@ c @ tiger @ This sentence contains fam.
@ c @ bear @ This sentence contains fam.
如何按第 2 列的字母数字、第 4 列中某些关键字的出现以及第 3 列中最常见的重复到最不常见的重复对文件进行排序?