0

我想根据以数字(1。*)开头的行将文本文件拆分为多个文本文件例如我想将此文本文件拆分为 2 个文件:

 1. J Med Chem. 2013 May 23;56(10):4028-43. doi: 10.1021/jm400241j. Epub 2013 May 13.

Optimization of benzoxazole-based inhibitors of Cryptosporidium parvum inosine
5'-monophosphate dehydrogenase.

Gorla SK, Kavitha M, Zhang M, Chin JE, Liu X, Striepen B, Makowska-Grzyska M, Kim
Y, Joachimiak A, Hedstrom L, Cuny GD.

Department of Biology, Brandeis University , 415 South Street, Waltham,
Massachusetts 02454, USA.

Cryptosporidium parvum is an enteric protozoan parasite that has emerged as a
major cause of diarrhea, malnutrition, and gastroenteritis and poses a potential 
bioterrorism threat.

PMID: 23668331  [PubMed - indexed for MEDLINE]


 2.Biochem Pharmacol. 2013 May 1;85(9):1370-8. doi: 10.1016/j.bcp.2013.02.014. Epub 
2013 Feb 16.

Carbonyl reduction of triadimefon by human and rodent 11β-hydroxysteroid
dehydrogenase 1.

Meyer A, Vuorinen A, Zielinska AE, Da Cunha T, Strajhar P, Lavery GG, Schuster D,
Odermatt A.

Swiss Center for Applied Human Toxicology and Division of Molecular and Systems
Toxicology, Department of Pharmaceutical Sciences, University of Basel,
Klingelbergstrasse 50, 4056 Basel, Switzerland.

11β-Hydroxysteroid dehydrogenase 1 (11β-HSD1) catalyzes the conversion of
inactive 11-oxo glucocorticoids (endogenous cortisone, 11-dehydrocorticosterone
and synthetic prednisone) to their potent 11β-hydroxyl forms (cortisol,
corticosterone and prednisolone).

Copyright © 2013 Elsevier Inc. All rights reserved.

PMID: 23419873  [PubMed - indexed for MEDLINE]

我试过这个:

awk 'NF{print > $2;close($2);}' file

和这个:

split -l 2

但我对如何给出空行感到困惑。(我是awk的新手。)

4

2 回答 2

3

我认为您正在寻找的是:

awk '/^[[:space:]]+[[:digit:]]+\./{ if (fname) close(fname); fname="out_"$1; sub(/\..*/,"",fname) } {print > fname}' file

根据@zjhui 的要求评论版本:

awk '
/^[[:space:]]+[[:digit:]]+\./ {     # IF the line starts with spaces, then digits then a period THEN
    if (fname)                      #     IF the output file name variable is populated THEN
        close(fname)                #         close the file youve been writing to until now
                                    #     ENDIF
    fname="out_"$1                  #     set the output file name to the word "out_" followed by the first field of this line, e.g. "out_2.Biochem"
    sub(/\..*/,"",fname)            #     strip everything from the period on from the file name so it becomes e.g. "out_2"
}                                   # ENDIF
{                                   # IF true THEN
    print > fname                   #     print the current record to the filename stored in the variable fname, e.g. "out_2".
}                                   # ENDIF
' file
于 2013-08-22T05:04:55.433 回答
0

这应该有效。

awk -F"\." '/^ +[0-9]+\./
           {
            gsub(/ /,"",$1);
            file="file_"$1
           }
          {
            print >file
          }' Your_file
于 2013-08-22T13:38:06.767 回答