0

我有一个带有 Wiki 标记的文本文件。例子:

[[April]]

April is the fourth month of the year. It has 30 days. The name April comes from that Latin word aperire which means "to open". This probably refers to growing plants in spring. April begins on the same day of week as July in all years and also January in leap years.

April's flower is the Sweet Pea. Its birthstone is the diamond. The meaning of the diamond is innocence.

== April in poetry ==

Poets use April to mean the end of winter. For example: April showers bring May flowers.

== Events in April ==

[[August]]

August is the eighth month of the year in the Gregorian calendar, coming between July and September. It has 31 days, the same number of days as the previous month, July, and is named after Roman Emperor Augustus Caesar.

== The Month ==

This month was first called Sextilis in Latin, because it was the sixth month in the old Roman calendar. The Roman calendar began in March about 735 BC with Romulus. October was the eighth month. August was the eighth month when January or February were added to the start of the year by King Numa Pompilius about 700 BC. Or, when those two months were moved from the end to the beginning of the year by the decemvirs about 450 BC (Roman writers disagree). In 153 BC January 1 was determined as the beginning of the year.

August is named for Augustus Caesar who became Roman consul in this month.  The month has 31 days because Julius Caesar added two days when he created the Julian calendar in 45 BC. August is after July and before September.

August, in either hemisphere, is the seasonal equivalent of February in the other. In the Northern hemisphere it is a summer month and it is a winter month in the Southern hemisphere. In a common year, no other month begins on the same day of the week as August, though in leap years, February starts on the same day as August. August always ends on the same day of the week as November.

August's flower is the Gladiolus with the birthstone being peridot. The astrological signs for August are Leo (July 24 - August 22) and Virgo (August 23 - September 23).

== August observances ==

=== Fixed observances and events ===

=== Moveable and Monthlong events ===

== Selection of Historical Events ==

== References ==

四月和八月都是维基文章。我设法使用以下内容提取标题:

$fh = fopen("wiki2.txt", "r");
if ($fh) {
    while (($line = fgets($fh)) !== false) {
        preg_match_all('#\\[\\[(.*?)\\]\\]#',$line,$matches,PREG_SET_ORDER);
        foreach($matches as $m) {
            echo $m[0]."<br />";
        }
    }
    fclose($fh);
}

但是,我也希望能够提取文章中的文本。有人对我可以做什么(正则表达式或其他解决方案)来提取文章数据有任何想法吗?

谢谢!

4

1 回答 1

1

我认为你想多了(另外,wiki 标记并不比 HTML 更适合正则表达式。)

为什么不这样做:

$HeaderNumber = 0;
$Document[$HeaderNumber]['Title'] = "Default";
while (($line = fgets($fh)) !== false) {
        if (strpos('[[', $line) > -1 && strpos(']]', $line) > -1){
            $Document[$HeaderNumber]['Text'] = implode($Document[$HeaderNumber]['Lines'], "\n");
            unset($Document[$HeaderNumber]['Lines']);
            $HeaderNumber++;
            $line = str_replace(array("[[","]]"), "", $line);
            $Document[$HeaderNumber]['Title'] = $line;
            continue;
        }

        $Document[$HeaderNumber]['Lines'][] = $line;

    }
}

这将创建一个以数字索引的数组,每个数组都有一个 Title 和一个 Text 字段,其中包含您对名称的期望。您可以使用pear 库中的 Text_Wiki 模块将文本进一步处理为 HTML 。

于 2012-10-17T03:22:33.183 回答