2

我正在尝试提取年份并将其打印在单独的新列上,并保持新列对齐。

这是输入文件:

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)

我怎样才能在这样的单独列中获得年份?

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                  1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring               2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                 1975
0000000124  733447   8.7  Inception                                                       2010
0000000233  411397   8.7  Goodfellas                                                      1990
0000000123  519051   8.7  Star Wars                                                       1977
0000000124  146841   8.7  Shichinin no samurai                                            1954
0000000123  618195   8.7  Forrest Gump                                                    1994
0000000123  680520   8.7  The Matrix                                                      1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                           2002
0000000233  309137   8.7  Cidade de Deus                                                  2002
0000000232  548307   8.6  Se7en                                                           1995
0000000232  459707   8.6  The Silence of the Lambs                                        1991
4

4 回答 4

5
sed 's/)\s*$//' file|column -s '(' -t

将在给定的输入上工作并为您提供预期的输出。

在这里测试:

kent$  echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)"|sed 's/)\s*$//'|column -s '(' -t
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                     1975
0000000124  733447   8.7  Inception                                           2010
0000000233  411397   8.7  Goodfellas                                          1990
0000000123  519051   8.7  Star Wars                                           1977
0000000124  146841   8.7  Shichinin no samurai                                1954
0000000123  618195   8.7  Forrest Gump                                        1994
0000000123  680520   8.7  The Matrix                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers               2002
0000000233  309137   8.7  Cidade de Deus                                      2002
0000000232  548307   8.6  Se7en                                               1995
0000000232  459707   8.6  The Silence of the Lambs                            1991
于 2013-07-26T08:38:09.887 回答
4

这是一个解决方案awk,它适用于您的示例数据:

$ awk -F\( '{printf("%-77s %d\n", $1, $2)}' movies.txt

根据您的喜好调整格式(此处,年份位于第78列。您可以在格式说明符中更改它,例如,%-99s如果您希望它从第 100 列开始,请使用。

于 2013-07-26T08:43:17.533 回答
4

这是一个快速的技巧:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back      1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring   2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest                     1975
0000000124 733447 8.7 Inception                                           2010
0000000233 411397 8.7 Goodfellas                                          1990
0000000123 519051 8.7 Star Wars                                           1977
0000000124 146841 8.7 Shichinin no samurai                                1954
0000000123 618195 8.7 Forrest Gump                                        1994
0000000123 680520 8.7 The Matrix                                          1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers               2002
0000000233 309137 8.7 Cidade de Deus                                      2002
0000000232 548307 8.6 Se7en                                               1995
0000000232 459707 8.6 The Silence of the Lambs                            1991

awk用于从最后一个字段中删除括号并插入一个{字符。输出通过管道输入以使用分隔符column构建表格。{我选择{字符,因为我认为它不太可能出现在数据的其他任何地方,如果不是这种情况,请选择不同的字符。

如果我是你,我也会引用电影片名:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | ..
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back"      1980
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring"   2001
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest"                     1975
0000000124 733447 8.7 "Inception"                                           2010
0000000233 411397 8.7 "Goodfellas"                                          1990
0000000123 519051 8.7 "Star Wars"                                           1977
0000000124 146841 8.7 "Shichinin no samurai"                                1954
0000000123 618195 8.7 "Forrest Gump"                                        1994
0000000123 680520 8.7 "The Matrix"                                          1999
0000000123 604519 8.7 "The Lord of the Rings: The Two Towers"               2002
0000000233 309137 8.7 "Cidade de Deus"                                      2002
0000000232 548307 8.6 "Se7en"                                               1995
0000000232 459707 8.6 "The Silence of the Lambs"                            1991

更好的方法是使用像 python 这样的语言。

您可以使用字符串函数rfind()来计算填充。如果您有,您应该使用以下脚本python

import os
import sys

try:
    n = int(sys.argv[2])
except IndexError:
    n = 78
try:
    if os.path.isfile(sys.argv[1]):
        with open(sys.argv[1],'r') as f:
            for line in f:
                line = line.strip()
                pad = n - line.rfind("(")
                print line[:-7],' '*pad,line[-5:-1]
    else:
        print "Please provide a file."
except IndexError:
    print "Please provide a file."

将其保存到文件中table.py并运行如下:

$ python table.py file
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back        1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring     2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                       1975
0000000124  733447   8.7  Inception                                             2010
0000000233  411397   8.7  Goodfellas                                            1990
0000000123  519051   8.7  Star Wars                                             1977
0000000124  146841   8.7  Shichinin no samurai                                  1954
0000000123  618195   8.7  Forrest Gump                                          1994
0000000123  680520   8.7  The Matrix                                            1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                 2002
0000000233  309137   8.7  Cidade de Deus                                        2002
0000000232  548307   8.6  Se7en                                                 1995
0000000232  459707   8.6  The Silence of the Lambs                              1991
0000000123  123456   9.9  The best file (of all time)                           2025

注意电影的添加:

0000000123  123456   9.9  The best file (of all time) (2025)

如果您释放列的位置需要增加传递值作为第二个参数,如下所示:

$ python table.py file 100 
于 2013-07-26T08:29:38.593 回答
0

这是一个 python 2.X 解决方案:

$ python --version
Python 2.7.3
$ echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%s\t%s' % (s[:-7], s[-6:-2])"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back    1980

如果你的字符串tmpfile那么:

$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %s\n' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring                   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                     1975
0000000124  733447   8.7  Inception                                                           2010
0000000233  411397   8.7  Goodfellas                                                          1990
0000000123  519051   8.7  Star Wars                                                           1977
0000000124  146841   8.7  Shichinin no samurai                                                1954
0000000123  618195   8.7  Forrest Gump                                                        1994
0000000123  680520   8.7  The Matrix                                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                               2002
0000000233  309137   8.7  Cidade de Deus                                                      2002
0000000232  548307   8.6  Se7en                                                               1995
0000000232  459707   8.6  The Silence of the Lambs                                            1991
于 2013-07-26T08:29:45.253 回答