0

我正在读取一个 CSV 文件,但其中一些值没有被转义,所以 PHP 读错了。这是一个坏行的例子:

" 635",","AUBREY R. PHILLIPS (1920- ) - 粉彩描绘陡峭河谷中的小屋,可能是北威尔士,签名并注明日期,2000 年,带框,66 厘米 x 48 厘米。另一幅乡村景观,标题为反面“收获时间,萨默塞特”签名并注明日期,87 年,装裱,69 厘米 x 49 厘米。(2) NB - 奥布里·菲利普斯是伍斯特郡的艺术家,曾就读于斯托布里奇艺术学院。","40","60","WAT ","绘画、版画和水彩画",

你可以看到Harvest Time,Somerset在它周围有引号,导致 PHP 认为它是一个新值。

当我在每一行上执行 print_r() 时,虚线最终看起来像这样:

Array
(
    [0] =>  635
    [1] =>  
    [2] => AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time
    [3] => Somerset" signed and dated '87
    [4] => framed
    [5] => 69cm by 49cm. (2)  NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art."
    [6] => 40
    [7] => 60
    [8] => WAT
    [9] => Paintings, prints and watercolours
    [10] => 
)

这显然是错误的,因为它现在包含比其他正确行更多的数组元素。

这是我正在使用的 PHP:

$i = 1;
if (($file = fopen($this->request->data['file']['tmp_name'], "r")) !== FALSE) {
    while (($row = fgetcsv($file, 0, ',', '"')) !== FALSE) {
        if ($i == 1){
            $header = $row;
        }else{
            if (count($header) == count($row)){
                $lots[] = array_combine($header, $row);
            }else{
                $error_rows[] = $row;
            }

        }
        $i++;
    }
    fclose($file);
}

具有错误数量的值的行被放入$error_rows,其余的被放入一个大$lots数组中。

我能做些什么来解决这个问题?谢谢。

4

5 回答 5

1

如果您知道您将始终获得条目 0 和 1,并且数组中的最后 5 个条目始终正确,因此它只是描述性条目由于未转义的外壳字符而“损坏”,那么您可以提取前 2最后 5 次使用array_slice()implode()将剩余部分重新转换为单个字符串(恢复丢失的引号),并正确重建数组。

$testData = '" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated \'87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",';

$result = str_getcsv($testData, ',', '"');

$hdr = array_slice($result,0,2);
$bdy = array_slice($result,2,-5);
$bdy = trim(implode('"',$bdy),'"');
$ftr = array_slice($result,-5);

$fixedResult = array_merge($hdr,array($bdy),$ftr);
var_dump($fixedResult);

结果是:

array
  0 => string ' 635' (length=4)
  1 => string ' ' (length=1)
  2 => string 'AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time" Somerset" signed and dated '87" framed" 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.' (length=362)
  3 => string '40' (length=2)
  4 => string '60' (length=2)
  5 => string 'WAT' (length=3)
  6 => string 'Paintings, prints and watercolours' (length=34)
  7 => string '' (length=0)

不完美,但可能足够好

另一种方法是让生成 csv 的人正确逃离他们的围栏

于 2012-03-16T11:50:55.560 回答
1

如果您可以像这样逃避文本中的“:\”

并且在 fgetcsv 中使用指定转义字符

fgetcsv($file, 0, ',', '"','\');
于 2012-03-16T12:22:52.067 回答
0

这是一个很长的镜头,所以不要把我当回事。

我在文本中看到一个模式,所有你想忽略的“,”后面都有一个空格。搜索并替换 ', ' 为 'FUU' 或其他独特的东西。

现在解析 csv 文件。它可能会得到正确的格式。您只需将 'FUU' 替换回 ', '

:)

于 2012-03-16T11:48:35.780 回答
0

您可能正在将 CSV 文件的内容作为行数组读取,然后用逗号分隔每一行。这失败了,因为某些字段还包含逗号。可以帮助您的一个技巧是查找",",这将指示一个字段分隔符,该分隔符不太可能(但不幸的是,并非不可能)出现在字段内。

<?php
  $csv = file_get_contents("yourfile.csv");
  $lines = split("\r\n", $csv);
  echo "<pre>";
  foreach($lines as $line)
  {
    $line = str_replace("\",\"", "\"@@@\"", $line);
    $fields = split("@@@", $line);
    print_r($fields);
  }
  echo "</pre>";
?>
于 2012-03-16T11:51:28.220 回答
0
$csv = explode(' ', $csv);
foreach ($csv as $k => $v) if($v[0] == '"' && substr($v, -1) == '"') {
    $csv[$k] = mb_convert_encoding('&ldquo;' . substr($v, 1, -1) . '&rdquo;', 'UTF-8', 'HTML-ENTITIES');
}
$csv = implode(' ', $csv);
$csv = str_getcsv($csv);
于 2014-12-11T20:03:08.393 回答