regex - 在 sed 中使用字幕

Question

我从 sed 开始。任何人都可以引导我如何解决这个问题吗？我从这个开始，现在是 sed 的基本命令：

{0}{20}First subtitle
{30}{50}Second subtitle|New line is made this way.
{70}{100}Third.
{1010}{1033}Fourth etc.

括号中的数字表示字幕应该可见的开始和结束。让我们有一位翻译人员以这种方式翻译字幕（我将在此文本上签名 (*)）：

{0}{20}First subtitle
Translation of the first subtitle.
{30}{50}Second subtitle|New line is made this way.
Translation of the second subtitle.|Second line of translation of the second subtitle.
{70}{100}Third.
Translation of third.
{1010}{1033}Fourth etc.
Translation of fourth etc.

我需要做 3 件事：1）分离翻译的字幕：

{0}{20}Translation of the first subtitle.
{30}{50}Translation of the second subtitle|Second line of translation of the second subtitle.
{70}{100}Translation of third.
{1010}{1033}Translation of fourth etc.

2）从带有两个字幕（用*签名）的文本中分离出只有原始字幕并得到这个：

{0}{20}First subtitle
{30}{50}Second subtitle|New line is made this way.
{70}{100}Third.
{1010}{1033}Fourth etc.

3) 获取 1) 和 2) 的输出并获得带有两个字幕的原文（带符号 *）：

{0}{20}First subtitle
Translation of the first subtitle.
{30}{50}Second subtitle|New line is made this way.
Translation of the second subtitle.|Second line of translation of the second subtitle.
{70}{100}Third.
Translation of third.
{1010}{1033}Fourth etc.
Translation of fourth etc.

有人可以给我一些建议如何开始吗？非常感谢。

我可能应该提到（应该很清楚）我会这样称呼它：

cat input_file.txt | sed <"program" in sed>

score 0 · Accepted Answer

One way of doing it:

Steps 1 and 2: separate out the translated subtitle and original subtitle from (*) (I call it sub_both in the script below)

sed -r '
/^((\{[0-9]+\}){2}).*/ {
    w sub_orig
    s//\1/
    N
}
s/\n//
w sub_tran
' sub_both

What it does is:

Matches lines beginning with two sequence of digits enclosed in braces.
Write those lines to the file sub_orig
Replace the line with the 1st captured subexpression (which is the 2 sequences of digits)
Append next line (which is the translated line) to pattern space. As we recall, the pattern space after 3. is just the 2 sequences of digit.
Remove the newline in pattern space, resulting in {digits}{digits}translated line...
Write the pattern space to file sub_tran

Step 3: Now that we have sub_orig and sub_tran, reconstruct (*) as sub_both_2

paste -d "\n" sub_orig <(sed -r '/^((\{[0-9]+\}){2})//' sub_tran) >sub_both_2

sub_tran is preprocessed via sed to remove the 2 sequences of digits, and the 2 files are merged with newline as the separator.

p/s: <(command) is process substitution which creates a temporary file from command.

score 0 · Accepted Answer

将字幕文件 的翻译保存file_1到字幕文件 file_2后，执行以下命令：

sed -r 's/^[{][0-9]+[}][{][0-9]+[}]//' file_2 | paste -d"\n" file_1 -

regex - 在 sed 中使用字幕

2 回答 2

Related

Reference