2

我有一个有趣的项目要做!我正在考虑将 srt 文件转换为 csv/xls 文件。

srt 文件如下所示:

1
00:00:00,104 --> 00:00:02,669
Hi, I'm shell-scripting.

2
00:00:02,982 --> 00:00:04,965
I'm not sure if it would work,
but I'll try it!

3
00:00:05,085 --> 00:00:07,321
There must be a way to do it!

虽然我想将它输出到这样的 csv 文件中:

"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."   
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"

如您所见,每个字幕占用两行。我的想法是使用 grep 将 srt 数据放入 xls,然后使用 awk 格式化 xls 文件。

你们有什么感想?我想怎么写?我试过了

$grep filename.srt > filename.xls

似乎包括时间码和字幕字在内的所有数据最终都在 xls 文件的 A 列中......但我希望这些词在 B 列中...... awk 如何帮助格式化?

先感谢您!:)

4

4 回答 4

4
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; q="\""; s=q OFS q }
{
    split($2,a,/ .* /)
    print q $1 s a[1] s a[2] s $3 q
    for (i=4;i<=NF;i++) {
        print "", "", "", q $i q
    }
}

$ awk -f tst.awk file
"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work,"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"
于 2015-08-21T13:27:36.883 回答
1

我的另一个答案是一半 awk 和一半 Perl,但是,鉴于awk不能编写 Excel 电子表格,而可以,要求您同时掌握这两者以及何时完全能够自己完成这一切Perl似乎很愚蠢......所以这里是在 Perl 中:awkPerlPerl

#!/usr/bin/perl
use strict;
use warnings;

use Excel::Writer::XLSX;
my $workbook  = Excel::Writer::XLSX->new('result.xlsx');
my $worksheet = $workbook->add_worksheet();
my $ExcelRow=0; 
local $/ = "";   # set paragraph mode, so we read till next blank line as one record

while(my $para=<>){
   $ExcelRow++;                               # move down a line in Excel worksheet
   chomp $para;                               # strip CR
   my @lines=split /\n/, $para;               # split paragraph into lines on linefeed character
   my $scene = $lines[0];                     # pick up scene number from first line of para
   my ($start,$end)=split / --> /,$lines[1];  # pick up start and end time from second line
   my $cell=sprintf("A%d",$ExcelRow);         # work out cell
   $worksheet->write($cell,$scene);           # write scene to spreadsheet column A
   $cell=sprintf("B%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$start);           # write start time to spreadsheet column B
   $cell=sprintf("C%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$end);             # write end time to spreadsheet column C
   $cell=sprintf("D%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$lines[2]);        # write description to spreadsheet column D
   for(my $i=3;$i<scalar @lines;$i++){        # output additional lines of description
      $ExcelRow++;
      $cell=sprintf("D%d",$ExcelRow);         # work out cell
      $worksheet->write($cell,$lines[$i]);
   }
}

$workbook->close;

将上述内容保存在名为的文件中srt2xls,然后使用以下命令使其可执行:

chmod +x srt2xls

然后你可以运行它

./srt2xls < SomeFileile.srt

它会给你这个电子表格,叫做result.xlsx

在此处输入图像描述

于 2015-08-22T21:37:26.147 回答
1

我认为这样的事情应该做得很好:

awk -v RS= -F'\n' '
   { 
      sub(" --> ","\x7c",$2)                 # change "-->" to "|"
      printf "%s|%s|%s\n",$1,$2,$3           # print scene, time start, time stop, description
      for(i=4;i<=NF;i++)printf "|||%s\n",$i  # print remaining lines of description
   }' file.srt

-v RS=记录分隔符设置为空行。将-F'\n'字段分隔符设置为新行。

sub()管道符号 ( ) 替换“-->” |

前三个字段然后用竖线分隔打印,然后有一个小循环打印剩余的描述行,插入三个竖线符号使它们对齐。

输出

1|00:00:00,104|00:00:02,669|Hi, I'm shell-scripting.
2|00:00:02,982|00:00:04,965|I'm not sure if it would work,
|||but I'll try it!
3|00:00:05,085|00:00:07,321|There must be a way to do it!

因为我觉得 Perl 和 Excel 更有趣,所以我把上面的输出用 Perl 解析并写了一个真正的 Excel XLSX 文件。当然,没有真正需要使用awkPerl因此理想情况下,可以重新铸造awk并将其集成到其中,Perl因为后者可以编写 Excel 文件,而前者则不能。无论如何,这里是 Perl。

#!/usr/bin/perl
use strict;
use warnings;

use Excel::Writer::XLSX;
my $DEBUG=0; 
my $workbook  = Excel::Writer::XLSX->new('result.xlsx');
my $worksheet = $workbook->add_worksheet();
my $row=0; 

while(my $line=<>){
   $row++;                                   # move down a line in Excel worksheet
   chomp $line;                              # strip CR
   my @f=split /\|/, $line;                  # split fields of line into array @f[], on pipe symbols (|)
   for(my $j=0;$j<scalar @f;$j++){           # loop through all fields
     my $cell= chr(65+$j) . $row;            # calcuate Excell cell, starting at A1 (65="A")
     $worksheet->write($cell,$f[$j]);        # write to spreadsheet
     printf "%s:%s ",$cell,$f[$j] if $DEBUG;
   }
   printf "\n" if $DEBUG;
}

$workbook->close;

输出

在此处输入图像描述

于 2015-08-21T09:14:31.250 回答
0

Since you want to convert the srt into csv. below is awk command

 awk '{gsub(" --> ","\x22,\x22");if(NF!=0){if(j<3)k=k"\x22"$0"\x22,";else{k="\x22"$0"\x22 ";l=1}j=j+1}else j=0;if(j==3){print k;k=""}if(l==1){print ",,,"k ;l=0;k=""}}' inputfile > output.csv

detail veiw of awk

awk '{
       gsub(" --> ","\x22,\x22"); 
       if(NF!=0)
         {
           if(j<3)
              k=k"\x22"$0"\x22,";
           else
            {
              k="\x22"$0"\x22 ";
              l=1
            }
          j=j+1
         }
        else
          j=0;
        if(j==3)
          { 
            print k;
            k=""
          }
        if(l==1)
          {
            print ",,,"k;
            l=0;
            k=""
          }
    }' inputfile > output.csv

take the output.csv on windows platform and then open with microsoft excel and save it as .xls extension.

于 2015-08-21T06:24:58.443 回答