0

I am looking for awk code to join lines pasted from PDF. The joining should happen as per this rule: If the last character in a line is not a period . then a space character should be added to the line and the next line should be joined to it.

Sample Input Text(in a file):

In a perfect school, students would treat each other with affection and
respect. Differences would be tolerated, and even welcomed. Kids would
become more popular by being kind and supportive. Students would go out
of their way to make sure one another felt happy and comfortable. But most
schools are not perfect. Instead of being places of respect and tolerance,
they are places where the hateful act of bullying is widespread.

Students have to deal with all kinds of problems in schools. There are
the problems created by difficult classes, by too much homework, or by
personality conflicts with teachers. There are problems with scheduling
the classes you need and still getting some of the ones you want. There
are problems with bad cafeteria food, grouchy principals, or overcrowded
classrooms. But one of the most difficult problems of all has to do with a
terrible situation that exists in most schools: bullying.

Expected Output:

In a perfect school, students would treat each other with affection and respect. Differences would be tolerated, and even welcomed. Kids would become more popular by being kind and supportive. Students would go out of their way to make sure one another felt happy and comfortable. But most schools are not perfect. Instead of being places of respect and tolerance, they are places where the hateful act of bullying is widespread.

Students have to deal with all kinds of problems in schools. There are the problems created by difficult classes, by too much homework, or by personality conflicts with teachers. There are problems with scheduling the classes you need and still getting some of the ones you want. There are problems with bad cafeteria food, grouchy principals, or overcrowded classrooms. But one of the most difficult problems of all has to do with a terrible situation that exists in most schools: bullying.

(The expected output has each paragraph on a single line. Presumably: Paragraphs are separated from each other by blank lines.)

4

2 回答 2

0

这可能就足够了:

awk -v ORS= '!NF{$NF="\n"} NF{ $NF = $NF ($NF~/\.$/?"\n":" ")} 1' input
于 2013-08-07T08:01:24.177 回答
0

如果您的输入文件段落真的被空行分隔,那么您只需要:

awk -v RS= -v ORS='\n\n' '{$1=$1}1' file
于 2013-08-07T12:02:21.413 回答