0

我从 sed 开始。任何人都可以引导我如何解决这个问题吗?我从这个开始,现在是 sed 的基本命令:

{0}{20}First subtitle
{30}{50}Second subtitle|New line is made this way.
{70}{100}Third.
{1010}{1033}Fourth etc.

括号中的数字表示字幕应该可见的开始和结束。让我们有一位翻译人员以这种方式翻译字幕(我将在此文本上签名 (*)):

{0}{20}First subtitle
Translation of the first subtitle.
{30}{50}Second subtitle|New line is made this way.
Translation of the second subtitle.|Second line of translation of the second subtitle.
{70}{100}Third.
Translation of third.
{1010}{1033}Fourth etc.
Translation of fourth etc.

我需要做 3 件事:1)分离翻译的字幕:

{0}{20}Translation of the first subtitle.
{30}{50}Translation of the second subtitle|Second line of translation of the second subtitle.
{70}{100}Translation of third.
{1010}{1033}Translation of fourth etc.

2)从带有两个字幕(用*签名)的文本中分离出只有原始字幕并得到这个:

{0}{20}First subtitle
{30}{50}Second subtitle|New line is made this way.
{70}{100}Third.
{1010}{1033}Fourth etc.

3) 获取 1) 和 2) 的输出并获得带有两个字幕的原文(带符号 *):

{0}{20}First subtitle
Translation of the first subtitle.
{30}{50}Second subtitle|New line is made this way.
Translation of the second subtitle.|Second line of translation of the second subtitle.
{70}{100}Third.
Translation of third.
{1010}{1033}Fourth etc.
Translation of fourth etc.

有人可以给我一些建议如何开始吗?非常感谢。

我可能应该提到(应该很清楚)我会这样称呼它:

cat input_file.txt | sed <"program" in sed>
4

2 回答 2

0

One way of doing it:


Steps 1 and 2: separate out the translated subtitle and original subtitle from (*) (I call it sub_both in the script below)

sed -r '
/^((\{[0-9]+\}){2}).*/ {
    w sub_orig
    s//\1/
    N
}
s/\n//
w sub_tran
' sub_both

What it does is:

  1. Matches lines beginning with two sequence of digits enclosed in braces.
  2. Write those lines to the file sub_orig
  3. Replace the line with the 1st captured subexpression (which is the 2 sequences of digits)
  4. Append next line (which is the translated line) to pattern space. As we recall, the pattern space after 3. is just the 2 sequences of digit.
  5. Remove the newline in pattern space, resulting in {digits}{digits}translated line...
  6. Write the pattern space to file sub_tran

Step 3: Now that we have sub_orig and sub_tran, reconstruct (*) as sub_both_2

paste -d "\n" sub_orig <(sed -r '/^((\{[0-9]+\}){2})//' sub_tran) >sub_both_2 

sub_tran is preprocessed via sed to remove the 2 sequences of digits, and the 2 files are merged with newline as the separator.

p/s: <(command) is process substitution which creates a temporary file from command.

于 2012-11-14T05:34:30.537 回答
0

将字幕文件 的翻译保存file_1字幕文件 file_2后,执行以下命令:

sed -r 's/^[{][0-9]+[}][{][0-9]+[}]//' file_2 | paste -d"\n" file_1 - 
于 2012-11-11T18:36:34.697 回答