5

我得到一个 JSON 格式的数据馈送,并且是唯一可用的格式。在 PHP 中,我使用 json_decode 来解码 JSON,但它被破坏了,我发现 JSON 是在某些地方生成的,在一个人的昵称中带有双引号。我使用以下方法验证了这一点:http: //jsonformatter.curiousconcept.com

我无法控制数据的创建,但是当它发生时我必须处理这种损坏的格式。解析后的数据将被放入 MySQL TABLE 中。

例如:

"contact1": "David "Dave" Letterman",

json_decode 将返回 NULL。如果我手动保存文件,并将其更改为戴夫昵称周围的单引号,那么一切正常。

$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);

如何在 json_decode 处理之前修复 json_string 中损坏的 JSON 格式?应该如何预处理文件,反斜杠昵称的双引号?或者将它们更改为单引号?在 MySQL 中存储这样的双引号是个好主意吗?

我不知道每个数据馈送何时会发生这种情况,所以我不想只检查 contact1 是否有内部双引号来修复它们。PHP中有没有办法像上面的例子那样取一行,并在冒号后面的所有内容都反斜杠,除了外部双引号?谢谢!

这是 tftd 提供的正确代码:

<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David \"Dave\" Letterman",

$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "\n";
echo $preg . "\n";
?>

这是输出:

string(39) ""contact1": "David \"Dave\" Letterman","
NULL

"contact1": "David \"Dave\" Letterman",
4

5 回答 5

8

我有一个自己的 jsonFixer() 函数 - 它分两个步骤工作:删除垃圾(用于不连贯格式的相等性)和重新格式化。

<?php
  function jsonFixer($json){
    $patterns     = [];
    /** garbage removal */
    $patterns[0]  = "/([\s:,\{}\[\]])\s*'([^:,\{}\[\]]*)'\s*([\s:,\{}\[\]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
    $patterns[1]  = '/([^\s:,\{}\[\]]*)\{([^\s:,\{}\[\]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[2]  =  "/([^\s:,\{}\[\]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[3]  = "/(}),\s*/"; //JSON.parse() doesn't allow trailing commas
    /** reformatting */
    $patterns[4]  = '/([^\s:,\{}\[\]]+\s*)*[^\s:,\{}\[\]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[5]  = '/["\']+([^"\':,\{}\[\]]*)["\']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
    $patterns[6]  = '/(")([^\s:,\{}\[\]]+)(")(\s+([^\s:,\{}\[\]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[7]  = "/(')([^\s:,\{}\[\]]+)(')(\s+([^\s:,\{}\[\]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[8]  = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
    $patterns[9]  = '/,\s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
    $patterns[10] = '/\s+/'; //Find one or more spaces...
    $patterns[11] = '/^\s+/'; //Find one or more spaces at start of string...

    $replacements     = [];
    /** garbage removal */
    $replacements[0]  = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
    $replacements[1]  = '$1 { $2'; //...and put spaces between them;
    $replacements[2]  = '$1 }'; //...and put a space between them;
    $replacements[3]  = '$1'; //...so, remove trailing commas of any right curly brackets;
    /** reformatting */
    $replacements[4]  = '"$0"'; //...and put quotation marks surrounding them;
    $replacements[5]  = '"$1"'; //...and replace by single quotation marks;
    $replacements[6]  = '\\$1$2\\$3$4'; //...and add back slashes to its quotation marks;
    $replacements[7]  = '\\$1$2\\$3$4'; //...and add back slashes to its apostrophes;
    $replacements[8]  = '$1, $2'; //...and put a comma followed by a space character between them;
    $replacements[9]  = ' $1'; //...and replace by a space followed by a right curly bracket;
    $replacements[10] = ' '; //...and replace by one space;
    $replacements[11] = ''; //...and remove it.

    $result = preg_replace($patterns, $replacements, $json);

    return $result;
  }
?>

使用示例:

<?php
  // Received badly formatted json:
  // {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
  $json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, \'details\' : "Greatest \'Hits\' Album"}';
  jsonFixer($json_string);
?>

将导致:

{"contact1": "David \"Dave\" Letterman", "price" : "30.00", "details" : "Greatest \'Hits\' Album"}

注意:这没有使用所有可能的格式错误的 JSON 字符串进行测试,但我在复杂的多级 JSON 字符串上使用,并且在此之前运行良好。

于 2016-02-17T04:20:31.223 回答
3

正如其他人已经指出的那样,最好告诉客户 JSON 格式存在问题。要求他们将错误报告发送给原始开发人员/公司,以便他们修复它。如果他/他们无法解决 - 然后提供您的解决方案。你只需要addslashes在你之前的字符串json_encode

如果由于某种原因您最终不得不进行fix格式化,这里有一种可能对您有用的方法:

$data = '"contact1": "David "Dave" Letterman", "contact2": "Peter "Robert" Smith",{\'test\': \'working "something"\'}';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
// string '"contact1": "David \"Dave\" Letterman", "contact2": "Peter \"Robert\" Smith",{'test': 'working \"something\"'}' (length=110)

请记住,如果有人再次弄乱了 json 格式,这可能会中断。

于 2012-11-05T18:06:44.333 回答
0

告诉他们在输出前转义他们的字符串。您甚至可以提供修复它或提供代码解决方案。

否则,您可以将 preg_replace 与正则表达式一起使用

请参阅用 preg_replace 替换文本中指定的双引号

于 2012-11-05T17:12:38.890 回答
0

正如其他人所说,您可以进行搜索和替换,但困难的部分将是创建您的模糊匹配规则,因为为了解析它,您需要假设一些事情。可能,您需要假设:

1a) 键不包含冒号
1b) 或键引号被正确转义

2a) 值不包含逗号
2b) 或值已正确转义引号。

即使这样,您也可能会遇到解析混乱的情况,如果他们对 JSON 进行注释,情况会变得更糟。(不符合标准,但很常见。)

现在,根据数据,您可以使用换行符来决定何时查看新密钥,但同样,这并不可靠,您开始做出很大的假设。

因此,长话短说,您要么必须做出一些随时可能出错的假设,要么需要让他们修复数据。

于 2012-11-05T17:17:07.177 回答
0

当逗号和 [] 在值中包含 json 字符串时,正则表达式不可靠,担心和噩梦开始。在php json_decode 在 key 上没有引号的情况下失败,建议使用 pear Services_JSON 如果为类名修复代码并且无效 json 的游戏结束,则可以实现最安全的结果:

<?php include("Services_JSON-1.0.3b/JSON.php"); 
//Patched version https://github.com/pear/Services_JSON/edit/trunk/JSON.php

$json = <<< JSONCODEFROMJS
   { 
      sos:presents, 
      james:'bond', 
      "agent":[0,0,7], 
      secret:"{mission:'impossible',permit: \"tokill\"}",
      go:true 
    }
JSONCODEFROMJS;

function json_fix($json) {
   $sjson = new Services_JSON(SERVICES_JSON_IN_ARR|SERVICES_JSON_USE_TO_JSON| SERVICES_JSON_LOOSE_TYPE);
   $json_array=$sjson->decode($json);
   return json_encode($json_array);
}

$json_array = json_decode(json_fix($json),true);

if(json_last_error() == JSON_ERROR_NONE) {

   $json=json_encode($json_array,JSON_PRETTY_PRINT);
   echo "<pre>";
   echo(htmlentities($json));
   echo "</pre>";
} else {
   die(json_last_error_msg());
}
?>
于 2020-08-19T17:27:16.553 回答