0

我目前有一个问题。

我正在尝试使用正则表达式格式化文本块,我将解释到目前为止我所得到的,然后我将继续解释我的问题。

我有一个文本文件,带有一些叙述性文本。

VOLUME I



CHAPTER I


Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type 

It was popularised in the 1960s with the release of Letraset sheets containing 
Lorem Ipsum passages, and more recently with desktop publishing software like 
Aldus PageMaker including versions of Lorem Ipsum.


VOLUME II



CHAPTER II


Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
It has survived not only five centuries, but also the leap into electronic 
typesetting, remaining essentially unchanged. 

It was popularised in the 1960s with the release of Letraset sheets 
containing Lorem Ipsum passages, and more recently with desktop 
publishing software like Aldus PageMaker including versions of Lorem Ipsum.

...
...

它有多个VOLUMESCHAPTERS,需要通过 PHP 进行格式化,使其看起来像在文本文件中一样,并具有适当的间距。

首先,我调用这个格式化函数来处理一些空白和清理。

<?php    
function formatting($AStr)
{
    return preg_split('/[\r\n]{2,}/', trim($AStr));        
}    
?>

然后,我调用该文件并继续尝试格式化。

<!DOCTYPE html>
<html>
  <head>
    <title></title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <link rel="stylesheet" type="text/css" href="style.css" />
  </head>
<body>

<h1>Jane Austen</h1>

<h2>Emma</h2>

<?php

require_once 'format.inc.php';

$p = file_get_contents('emma.txt');

$p = formatting($p);

/*
foreach ($p as $l) {
    $l = trim($l);
    preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
    $volumePattern = '/(VOLUME +[IVX]+)/';
    $chaperPattern = '/(CHAPTER +[IVX]+)/';
    $l = str_replace("\r\n", ' ', $l);

    if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
    echo $l . "\n";
}*/

foreach ($p as $l) {
    //$l = trim($l);
    //$l = str_replace("[\r\n]", '\n', $l);
    if (preg_match('/[\.\w]/', $l, $m)) {
        echo "\n";
    }
    if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    $l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
    if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    $l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
    echo $l . "\n";
}


?>

</body>
</html>

问题是,我无法打印每个段落之间的空格(换行符)。我试过了,但我做不到。我尝试使用这条线:

if (preg_match('/[\.\w]/', $l, $m)) {
            echo "\n";
        }
4

2 回答 2

3

这可能被过度简化了,但你不能这样做吗?

<!DOCTYPE html>
<html>
  <head>
    <title></title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <link rel="stylesheet" type="text/css" href="style.css" />
  </head>
<body>

<h1>AUTHOR NAME</h1>

<h2>TITLE</h2>

<?php

  $p = file_get_contents('emma.txt');
  echo preg_replace('/^\s*((?:VOLUME|CHAPTER)\s+[IVX]+)\s*$/im', '<h3>$1</h3>', $p); 

?>

</body>
</html>

编辑

要将正文段落也包含在<p></p>(假设段落中没有新行)中,请尝试以下操作:

<!DOCTYPE html>
<html>
  <head>
    <title></title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <link rel="stylesheet" type="text/css" href="style.css" />
  </head>
<body>

<h1>AUTHOR NAME</h1>

<h2>TITLE</h2>

<?php

  $p = file_get_contents('emma.txt');
  echo preg_replace_callback('/^\s*(?:(?P<header>(?:VOLUME|CHAPTER)\s+[IVX]+)|(?P<body>.+))\s*$/im', function($matches) {
    if (!empty($matches['body'])) {
      return '<p>'.htmlspecialchars($matches['body']).'</p>';
    } else {
      return '<h3>'.htmlspecialchars($matches['header']).'</h3>';
    }
  }, $p);

?>

</body>
</html>

看到它工作

于 2012-08-24T10:22:42.970 回答
1

你有不同的错误,首先在“格式化”函数中,正则表达式必须是:

function formatting($AStr)
{
    return preg_split('/[\r\n]{2,}/', trim($AStr));        
}

在您必须知道 preg_replace 没有通过引用传递的变量之后,因此您必须通过函数的返回替换您的行:

foreach ($p as $l) {
    $l = trim($l);
    preg_replace('#VOLUME\s+[A-z]+#Ui', "jjj", $l);
    $l = str_replace("\r\n", ' ', $l);
    if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    $l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
    if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
        echo '<h3>' . $m[1] . '</h3>';
    }
    $l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
    echo $l . "\n";
}
于 2012-08-24T10:27:30.113 回答