-4

在我将数据转储到 MYSQL 数据库之前是否可以验证文本文件?

我想检查它是否包含 5 列(数据)。如果是这样,那么我继续进行以下查询:

LOAD DATA CONCURRENT INFILE 'c:/test/test.txt' 
INTO TABLE DUMP_TABLE FIELDS TERMINATED BY '\t' ENCLOSED BY '' LINES TERMINATED BY '\n' ignore 1 lines.

如果没有,我删除整行。我对 txt 文件中的所有行重复此过程。

文本文件包含以下格式的数据:

id  col2    col3    2012-07-27-19:27:06 col5

id  col2    col3    2012-07-25-09:58:50 col5

id  col2    col3    2012-07-23-10:14:13 col5
4

6 回答 6

2

编辑:阅读您的评论后,这是对制表符分隔数据执行相同操作的代码:

$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
   fgets($handler,$linetocheck);
   $cols = explode (chr(9), $linetocheck); //edit: using http://es.php.net/manual/en/function.fgetcsv.php you can get the same result as with fgets+explode
   if (count($cols)>$max_cols){
       $error=true;
       break;
   }
}
fclose($handler);
if (!$error){
    //...do stuff
}

这段代码逐行读取文件,比如“myfile.txt”,如果任何行的长度超过 $max_cols,则将变量 $error 设置为 true。(如果这不是你要问的,我很抱歉,你的问题对我来说不是最清楚的)

$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
   fgets($handler,$linetocheck);
   if (strlen($linetocheck)>$max_cols){
       $error=true;
       break;
   }
}
fclose($handler);
if (!$error){
    //...do stuff
}
于 2012-08-05T22:51:51.390 回答
2

我知道这是一个旧线程,但我自己也在寻找类似的东西,我遇到了这个话题,但这里提供的答案都没有帮助我。

因此,我继续前进并提出了我自己的解决方案,该解决方案经过测试并且可以完美运行(可以改进)。

假设我们有一个名为的 CSV 文件example.csv,其中包含以下虚拟数据(故意地,最后一行第 6 行包含一个额外数据,然后是其他行):

Name,Country,Age
John,Ireland,18
Ted,USA,22
Lisa,UK,23
Michael,USA,20
Louise,Ireland,22,11

现在,当我们检查 CSV 文件以确保所有行都具有相同数量的数据时,以下代码块将起到作用并指出错误发生在哪一行:

    function validateCsvColumnLength($pathToCsvFile)
    {
        if(!file_exists($pathToCsvFile) || !is_readable($pathToCsvFile)){
            throw new \Exception('Filename doesn`t exist or is not readable.');
        }

        if (!$handle = fopen($pathToCsvFile, "r")) {
            throw new \Exception("Stream error");
        }

        $rowLength       = [];
        $rowNumber       = 0;
        while (($data    = fgetcsv($handle)) !== FALSE) {
            $rowLength[] = count($data);
            $rowNumber++;
        }
        fclose($handle);

        $rowKeyWithError   = array_search(max($rowLength), $rowLength);
        $differentRowCount = count(array_unique($rowLength));

        // if there's a row that has more or less data, throw an error with the line that triggered it
        if ($differentRowCount !== 1) {
            throw new \Exception("Error, data count from row {$rowKeyWithError} does not match header size");
        }
        return true;
    }

要实际测试它,只需执行 var_dump() 即可查看结果:

   var_dump(validateCsvColumnLength('example.csv'));
于 2020-06-19T17:04:30.883 回答
0

Yes, it is possible. I've done that exact thing. Use PHP's csv processing functions.

You will need these functions:

fopen() fgetcsv()

And possibly some others.

fgetcsv returns an array.

I'll give you a short example of how you can validate.

here's the csv: col1,col2,col3,col4 1,2,3,4 1,2,3,4, 1,2,3,4,5 1,2,3,4

I'll skip the fopen part and go straight to the validation step. Note that "\t" is the tab character.

$row_length;
$i = 0;
while($row = fgetcsv($handle,0,"\t") {
  if($i == 0) {
    $row_length = sizeof($row);
  } else {
    if(sizeof($row) != $row_length) {
      echo "Error, line $i of the data does not match header size";
      break;
    }
  }
}

That would test each row to make sure it is the same as the 1st row's ($i = 0) length.

EDIT: And, in case you don't know how to search the internet, here is the page for fgetcsv: http://php.net/manual/en/function.fgetcsv.php

Here is the function prototype: array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\' ]]]] )

As you can see, it has everything you would need for doing a quick scan in PHP before you send your data to LOAD DATA IN FILE.

I have solved your exact problem in my own program. My program also automatically eliminates duplicate rows and other cool stuff.

于 2012-08-05T22:56:26.123 回答
0

你的意思是什么列?如果您只是指行中的字符数,只需将 ( explode) 文件拆分为多行并检查它们的长度是否等于 5。

fgetcsv如果您的意思是带有分隔符的列,那么您应该在每一行中找到该拆分器的出现次数,然后再次检查它们是否等于 5。用于

于 2012-08-05T22:46:34.140 回答
0

你可以试试看是否fgetcsv够用。如果不是,请对列的含义进行更多描述。

于 2012-08-05T22:46:52.033 回答
0

我假设您在谈论文件中每一行的长度。如果是这样,这是一个可能的解决方案。

$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
   $line = fgets($file_handle);
   if(strlen($line)!=5) {
       throw new Exception("Could not save file to database.");
       break;
   }
}
fclose($file_handle);
于 2012-08-05T22:48:42.067 回答