我目前有一个问题。
我正在尝试使用正则表达式格式化文本块,我将解释到目前为止我所得到的,然后我将继续解释我的问题。
我有一个文本文件,带有一些叙述性文本。
VOLUME I
CHAPTER I
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
VOLUME II
CHAPTER II
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
It has survived not only five centuries, but also the leap into electronic
typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of Lorem Ipsum.
...
...
它有多个VOLUMES和CHAPTERS,需要通过 PHP 进行格式化,使其看起来像在文本文件中一样,并具有适当的间距。
首先,我调用这个格式化函数来处理一些空白和清理。
<?php
function formatting($AStr)
{
return preg_split('/[\r\n]{2,}/', trim($AStr));
}
?>
然后,我调用该文件并继续尝试格式化。
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>Jane Austen</h1>
<h2>Emma</h2>
<?php
require_once 'format.inc.php';
$p = file_get_contents('emma.txt');
$p = formatting($p);
/*
foreach ($p as $l) {
$l = trim($l);
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
$volumePattern = '/(VOLUME +[IVX]+)/';
$chaperPattern = '/(CHAPTER +[IVX]+)/';
$l = str_replace("\r\n", ' ', $l);
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
echo $l . "\n";
}*/
foreach ($p as $l) {
//$l = trim($l);
//$l = str_replace("[\r\n]", '\n', $l);
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
echo $l . "\n";
}
?>
</body>
</html>
问题是,我无法打印每个段落之间的空格(换行符)。我试过了,但我做不到。我尝试使用这条线:
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}