1

我收到的输出如下所示:

2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type2   0.1
2013-08-05-Mon 10:17:00 type2   -0.2

为了得到这个输出,我发送head -3 Tweets/FlumeData.txt | python sentimentMapper

要对它们进行排序,head -3 Tweets/FlumeData.txt | python sentimentMapper| 排序-k3`

这目前是按第三列对数据进行排序,所以所有的type1,然后是所有的type2。理想情况下,我想按字母顺序对数据进行排序,然后按数字排序(换句话说,所有的type1值从最低到最高,然后是type2从最低到最高的所有值。)

我试过sort -k3 -k4n了:但无济于事。我该如何解决这个问题?

编辑:理想输出:

2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type2   -0.2
2013-08-05-Mon 10:17:00 type2   0.1
4

2 回答 2

1

尝试这个 :

LANG=C sort -k3,3 -k4,4n file

来自info coreutils 'sort invocation'

`-k POS1[,POS2]'
`--key=POS1[,POS2]'
     Specify a sort field that consists of the part of the line between
     POS1 and POS2 (or the end of the line, if POS2 is omitted),
     _inclusive_.

     Each POS has the form `F[.C][OPTS]', where F is the number of the
     field to use, and C is the number of the first character from the
     beginning of the field.  Fields and character positions are
     numbered starting with 1; a character position of zero in POS2
     indicates the field's last character.  If `.C' is omitted from
     POS1, it defaults to 1 (the beginning of the field); if omitted
     from POS2, it defaults to 0 (the end of the field).  OPTS are
     ordering options, allowing individual keys to be sorted according
     to different rules; see below for details.  Keys can span multiple
     fields.

     Example:  To sort on the second field, use `--key=2,2' (`-k 2,2').
     See below for more notes on keys and more examples.  See also the
     `--debug' option to help determine the part of the line being used
     in the sort.

对于LANG=C

   (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to.  In that case, set the `LC_ALL' environment
variable to `C'.  Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set.  Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value.  For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.

你也可以看看这篇文章:https ://stackoverflow.com/a/5868546/465183

于 2013-08-07T20:09:15.397 回答
0

-k3选项按定义为“从第二个字段后的第一个空白字符开始,到行尾结束”的字段排序,这可能不是您想要的。你可能想要的是这样的:

sort -n -k3,3 -k4,4 file

添加LANG=Csputnik 提到的位也可能很有用。

于 2013-08-07T20:19:07.217 回答