python - 多列的列排序

Question

我收到的输出如下所示：

2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type2   0.1
2013-08-05-Mon 10:17:00 type2   -0.2

为了得到这个输出，我发送head -3 Tweets/FlumeData.txt | python sentimentMapper

要对它们进行排序，head -3 Tweets/FlumeData.txt | python sentimentMapper| 排序-k3`

这目前是按第三列对数据进行排序，所以所有的type1，然后是所有的type2。理想情况下，我想按字母顺序对数据进行排序，然后按数字排序（换句话说，所有的type1值从最低到最高，然后是type2从最低到最高的所有值。）

我试过sort -k3 -k4n了：但无济于事。我该如何解决这个问题？

编辑：理想输出：

2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type2   -0.2
2013-08-05-Mon 10:17:00 type2   0.1

score 1 · Accepted Answer

尝试这个：

LANG=C sort -k3,3 -k4,4n file

来自info coreutils 'sort invocation'：

`-k POS1[,POS2]'
`--key=POS1[,POS2]'
     Specify a sort field that consists of the part of the line between
     POS1 and POS2 (or the end of the line, if POS2 is omitted),
     _inclusive_.

     Each POS has the form `F[.C][OPTS]', where F is the number of the
     field to use, and C is the number of the first character from the
     beginning of the field.  Fields and character positions are
     numbered starting with 1; a character position of zero in POS2
     indicates the field's last character.  If `.C' is omitted from
     POS1, it defaults to 1 (the beginning of the field); if omitted
     from POS2, it defaults to 0 (the end of the field).  OPTS are
     ordering options, allowing individual keys to be sorted according
     to different rules; see below for details.  Keys can span multiple
     fields.

     Example:  To sort on the second field, use `--key=2,2' (`-k 2,2').
     See below for more notes on keys and more examples.  See also the
     `--debug' option to help determine the part of the line being used
     in the sort.

对于LANG=C：

   (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to.  In that case, set the `LC_ALL' environment
variable to `C'.  Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set.  Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value.  For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.

你也可以看看这篇文章：https ://stackoverflow.com/a/5868546/465183

score 0 · Accepted Answer

该-k3选项按定义为“从第二个字段后的第一个空白字符开始，到行尾结束”的字段排序，这可能不是您想要的。你可能想要的是这样的：

sort -n -k3,3 -k4,4 file

添加LANG=Csputnik 提到的位也可能很有用。

python - 多列的列排序

2 回答 2

Related

Reference