2

我想逐行读取文件夹中的一些文本文件。例如 1 txt :

Fast and Effective Text Mining Using Linear-time Document Clustering
Bjornar Larsen WORD2 Chinatsu Aone
SRA International AK, Inc.
4300 Fair Lakes Cow-l Fairfax, VA 22033

{bjornar-larsen, WORD1

我想删除不包含单词 = word, word2, word3, 并且不以点结尾的行.

所以。从示例中,结果将是:

Bjornar Larsen WORD2 Chinatsu Aone
SRA International, Inc.
{bjornar-larsen, WORD1

我很困惑,怎么去掉这条线?这可能吗?或者我们可以用空格替换它们吗?

这是代码:

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content= file_get_contents($files);
    foreach(preg_split("/((\r?\n)|(\r\n?))/", $ori_content) as $buffer){
        $pos1 = stripos($buffer, $word1);
        $pos2 = stripos($buffer, $word2);
        $pos3 = stripos($buffer, $word3);
        $last = $str[strlen($buffer)-1];//read the las character
        if (true !== $pos1 OR true !== $pos2 OR true !==$pos3 && $last != '.'){
        //how to remove
        }
    }
}

请帮助我,非常感谢你:)

4

5 回答 5

2

您正在使用!== true比较来测试stripos. !== true表示“不绝对等于布尔值 true”。的返回值stripos是数字,除非这个词不存在,在这种情况下它是false. 换句话说,你的条件总是错误的。

尝试更新它以=== false代替使用。此外,您OR在两者之间使用;您的示例表明它只需要包含其中 1 个 - 因此,如果您检查“没有找到它们”,则需要使用&&所有内容:

if (($pos1 === false) && ($pos2 === false) && ($pos3 === false) && ($last != '.'))

关于“如何删除线”,您需要保留所有要保留的线的列表。这意味着,我们实际上想要翻转上面的条件以使用!== false||介于所有内容之间(因为我们想要保留与任何规则匹配的所有行)。

尝试这样的事情:

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content= file_get_contents($files);
    $linesToKeep = array(); // list of all lines that match our rules
    foreach(preg_split("/((\r?\n)|(\r\n?))/", $ori_content) as $buffer){
        $pos1 = stripos($buffer, $word1);
        $pos2 = stripos($buffer, $word2);
        $pos3 = stripos($buffer, $word3);
        $last = $str[strlen($buffer)-1];

        if (($pos1 !== false) || ($pos2 !== false) || ($pos3 !== false) || ($last == '.')) {
            $linesToKeep[] = $buffer; // save this line
        }
    }
    // process list of lines for this file;
    // file_put_contents($files, join("\r\n", $linesToKeep)); // write back to file
    // $lines = join("\r\n", $linesToKeep); // convert to string to manipulate
}

现在,您将在$linesToKeep数组中拥有与您的规则集匹配的每一行。您可以使用 将其转换回字符串$lines = join("\r\n", $linesToKeep);,或者遍历它并根据需要进行处理。

于 2012-09-30T05:28:50.517 回答
1

不错的方法...但是您可以使用arrays读取文件并将其放入文件中。到现在为止还好。

PS:可以有更好的方法来做......

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content= file_get_contents($files);

    # Declare a variable array to store the contents.
    $fileContents = array();

    foreach(preg_split("/((\r?\n)|(\r\n?))/", $ori_content) as $buffer){
        $pos1 = stripos($buffer, $word1);
        $pos2 = stripos($buffer, $word2);
        $pos3 = stripos($buffer, $word3);
        $last = $str[strlen($buffer)-1];//read the las character
        if (($pos1 !== false) || ($pos2 !== false) || ($pos3 !== false) || ($last == '.')){
            $fileContents[] = $buffer;
        }
    }

    # Put the contents
    file_put_contents($file, implode(PHP_EOL, $fileContents);

}
于 2012-09-30T05:28:35.540 回答
1

尝试

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
  $lines = file($files);
  foreach ($lines as $key=>$line) {
    if (!preg_match('/(word|word2|word3)/i', $line) && substr($line, -1) != '.') {
      unset($lines[$key]);
    }
  }
  $ori_content = implode("\n", $lines);
}
于 2012-09-30T05:47:29.207 回答
0

您需要创建一个辅助缓冲区。

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content= file_get_contents($files);
    /* Create our second buffer */
    $buffer2 = "";
    foreach(preg_split("/((\r?\n)|(\r\n?))/", $ori_content) as $buffer){
        $pos1 = stripos($buffer, $word1);
        $pos2 = stripos($buffer, $word2);
        $pos3 = stripos($buffer, $word3);
        $last = $str[strlen($buffer)-1];//read the last character
        /* This will only execute if the three words and a trailing period are _not_ found */
        if ($pos1 === false && $pos2 === false && $pos3 === false && $last != '.') {
            $buffer2 .= $buffer . PHP_EOL;
        }
    }
}
echo $buffer2;
于 2012-09-30T05:25:43.620 回答
0

我只会使用爆炸:

$handle = fopen($files, "r") or die ('can not open file');
$ori_content = file_get_contents($files);

$lines = explode ( '\n' , $ori_content );

foreach ( $lines AS $line )
{
 if (strpos ( $line , 'word' ) !== false OR strpos ( $line , 'word2' ) !== false OR strpos ( $line , 'word3' ) !== false OR substr ( $line , -1 ) == '.')
  {
   $newParagraph = $line . '\n';
  }
}

echo $newParagraph;

比您尝试做的要简单得多。

于 2012-09-30T05:30:34.740 回答