0

这是我的问题:我有一个大字符串(近 8000 个字符),我想要两件事:

  1. 检测像“。”这样的句子边界 和
  2. 有不超过 600 个字符的句子

我知道在某些情况下不可能两者兼得。在这种情况下,找到一个空格并拆分句子。

ridgerunner为条件 1 提供的这个解决方案就像一个魅力,请参阅原始链接 ( http://goo.gl/PqI6d ),但它通常输出大于 600 个字符的句子。有光吗??提前致谢!

4

2 回答 2

0

Tks nhahtdh。请看看我是否遗漏了什么。以下是我的字符串的摘录和使用您的建议的输出。

<?php 
    $ptn = "/(?:[^.]{1,600}(?: |\.)|\w{600,}(?: |\.)?)/";
    $str = "Amblyopia occurs when the nerve pathway from one eye to the brain does not develop during childhood. This occurs because the abnormal eye sends a blurred image or the wrong image to the brain. This confuses the brain, and the brain may learn to ignore the image from the weaker eye. Strabismus is the most common cause of amblyopia. There is often a family history of this condition. The term "lazy eye" refers to amblyopia, which often occurs along with strabismus. However, amblyopia can occur without strabismus and people can have strabismus without amblyopia.First, any eye condition that is causing poor vision in the amblyopic eye (such as cataracts) needs to be corrected. Children with a refractive error (nearsightedness, farsightedness, or astigmatism) will need glasses. Next, a patch is placed on the normal eye. This forces the brain to recognize the image from the eye with amblyopia. Sometimes, drops are used to blur the vision of the normal eye instead of putting a patch on it. Children whose vision will not fully recover, and those with only good eye due to any disorder should wear glasses with protective polycarbonate lenses. Polycarbonate glasses are shatter- and scratch-resistant. Children who get treated before age 5 will usually recover almost completely normal vision, although they may continue to have problems with depth perception. Delaying treatment can result in permanent vision problems. After age 10, only a partial recovery of vision can be expected. Early recognition and treatment of the problem in children can help to prevent permanent visual loss. All children should have a complete eye examination at least once between ages 3 and 5. Special techniques are needed to measure visual acuity in a child who is too young to speak. Most eye care professionals can perform these techniques.";
    preg_split($ptn, $str, -1, PREG_SPLIT_NO_EMPTY);
    print_r($result);
    ?>

结果:我需要小于 600 字符的字符串中的句子

 Array
(
[0] => childhood.
[1] => brain.
[2] => eye.
[3] => amblyopia.
[4] => condition.
[5] => strabismus.
[6] => amblyopia.
[7] => corrected.
[8] => glasses.
[9] => eye.
[10] => amblyopia.
[11] => it.
[12] => lenses.
[13] => scratch-resistant.
[14] => perception.
[15] => problems.
[16] => expected.
[17] => loss.
[18] => 5.
[19] => speak.
[20] => techniques
)
于 2012-07-09T15:14:19.293 回答
0

你可能会更好地匹配字符串。您的匹配正则表达式可能如下所示:

(.{0,600}?\.)|(.{0,600}(?=\ ))

简而言之,您首先在句点之前寻找尽可能小的字符串。如果没有,则查找尽可能长的字符串,后跟空格。然后下一场比赛将从你离开的地方开始。

请注意,这是通用正则表达式。您的 php 实现可能会有所不同。

于 2012-07-09T05:43:14.737 回答