0

我有段落格式的文本,每个段落文章上方总是有一个日期。问题是在每篇文章之后,都有未知的换行符,它们是不同类型的 unicode 换行符。我需要删除每个段落之间的换行符的每个实例,并将其替换为两个\n\n.

所以从这

05/12
The 1959 Mexico hurricane was a devastating tropical cyclone
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The
hurricane killed at least 1,000 people.




11/01
The 1959 Mexico hurricane was a devastating tropical cyclone
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The
hurricane killed at least 1,000 people.

对此

05/12
The 1959 Mexico hurricane was a devastating tropical cyclone
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The
hurricane killed at least 1,000 people.

11/01
The 1959 Mexico hurricane was a devastating tropical cyclone
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The
hurricane killed at least 1,000 people.

我尝试使用preg_replace()但它不匹配每个实例?

$text = preg_replace('/\r?\n+(?=\d{2}\/\d{2})/', "\n\n", $text);
4

1 回答 1

1

question大约一个月前,我在类似的帖子上发布过。

要匹配任何被认为是换行序列的内容,您可以使用\R

\R 匹配通用换行符;也就是说,任何被 Unicode 视为换行序列的东西。这包括由 \v(垂直空格)和多字符序列 \x0D\x0A 匹配的所有字符。

试试这个。

$text = preg_replace('~\R+(?=\d{2}/\d{2})~u', "\n\n", $text);

请参阅有关PCRE实现此功能的不同方法的文档。

于 2013-10-31T00:46:15.553 回答