0

I have the following csv file:

"Id","Title","Body","Tags"
"101","this title","
\"">.</>"";
","c# asp.net excel table"

which I want to convert into an array as follows:

Array
(
    [0] => Array
        (
            [0] => Id
            [1] => Title
            [2] => Body
            [3] => Tags
        )

    [1] => Array
        (
            [0] => 101
            [1] => this title
            [2] => \"">.</>"";
            [3] => c# asp.net excel table
        )
)

My code is:

while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {
    $num = count($data);

    for ($c=0; $c < $num; $c++) {
        $data[$c] = strip_tags($data[$c]);
    }

    $result[$row] = $data;
    $row++;
}
fclose($handle);
return $result;

My problem is I am getting the following array:

Array
(
    [0] => Array
        (
            [0] => Id
            [1] => Title
            [2] => Body
            [3] => Tags
        )

    [1] => Array
        (
            [0] => 101
            [1] => this title
            [2] => 
\">.</>"";
        )

    [2] => Array
        (
            [0] => ,c# asp.net excel table"
        )

)

In general, how do I avoid detecting too many recors when there is potentially code inside the fields (it's a StackOverflow data dump so some text fields have all kinds of programming code).

4

2 回答 2

1

尝试使用CSVed打开文件以确保其格式正确为 CSV。

如果 CSV 被破坏,那么您可以对解析的结果进行一些快速修复。例如:

while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {
    $num = count($data);

    for ($c=0; $c < $num; $c++) {
        $data[$c] = strip_tags($data[$c]);
    }

    if (count($data) == 3) {
        $data[1][2] .= $data[2].[0];
        unset($data[2]);
    }

    $result[$row] = $data;
    $row++;
}
fclose($handle);
return $result;
于 2013-10-01T15:33:48.960 回答
1

此字符串未正确转义:

"
\""&gt;.&lt;/&gt;"";
"

所有引号字符必须在它们之前有反斜杠(或您传递给适当参数的其他转义字符。并且您不应该将 0 和逗号传递给 fgetcsv,它们已经是默认值:http: //php.net/fgetcsv

于 2013-10-01T14:14:24.627 回答