regex - R：替换双转义文本

Question

我正在使用 Amazon Elastic Map Reduce 命令行工具将多个系统调用粘合在一起。这些命令返回已经（部分？）转义的 JSON 文本。然后，当系统调用将其转换为 R 文本对象 (intern=T) 时，它似乎又被转义了。我需要清理它，以便它使用 rjson 包进行解析。

我以这种方式进行系统调用：

system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T)

返回：

 [1] "{"                                                                                             
 [2] "  \"JobFlows\": ["                                                                             
 [3] "    {"                                                                                         
 [4] "      \"LogUri\": \"s3n:\\/\\/emrlogs\\/\","                                                   
 [5] "      \"Name\": \"emrFromR\","                                                                 
 [6] "      \"BootstrapActions\": [" 
...

但命令行中的相同命令返回：

{
  "JobFlows": [
    {
      "LogUri": "s3n:\/\/emrlogs\/",
      "Name": "emrFromR",
      "BootstrapActions": [
        {
          "BootstrapActionConfig": {
...

如果我尝试通过 rjson 运行系统调用的结果，我会得到这个错误：

Error: '\/' is an unrecognized escape in character string starting "s3n:\/"

我相信这是因为 s3n 行中的双重转义。我正在努力将这个文本按摩成可以解析的东西。

它可能就像用“\”替换“\\”一样简单，但由于我有点挣扎于正则表达式和转义，我无法正确完成。

那么如何获取字符串向量并将任何出现的“\\”替换为“\”？（即使要输入这个问题，我也必须使用三个反斜杠来表示两个）与此特定用例相关的任何其他提示？

这是我更详细的代码：

> library(rjson)
> emrJson <- paste(system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T))
> 
>     parser <- newJSONParser()
>     for (i in 1:length(emrJson)){
+       parser$addData(emrJson[i])
+     }
> 
> parser$getObject()
Error: '\/' is an unrecognized escape in character string starting "s3n:\/"

如果您渴望重新创建 emrJson 对象，这里是 dput() 输出：

> dput(emrJson)
c("{", "  \"JobFlows\": [", "    {", "      \"LogUri\": \"s3n:\\/\\/emrlogs\\/\",", 
"      \"Name\": \"emrFromR\",", "      \"BootstrapActions\": [", 
"        {", "          \"BootstrapActionConfig\": {", "            \"Name\": \"Bootstrap 0\",", 
"            \"ScriptBootstrapAction\": {", "              \"Path\": \"s3:\\/\\/rtmpfwblrx\\/bootstrap.sh\",", 
"              \"Args\": []", "            }", "          }", 
"        }", "      ],", "      \"ExecutionStatusDetail\": {", 
"        \"EndDateTime\": 1278124414.0,", "        \"CreationDateTime\": 1278123795.0,", 
"        \"LastStateChangeReason\": \"Steps completed\",", "        \"State\": \"COMPLETED\",", 
"        \"StartDateTime\": 1278124000.0,", "        \"ReadyDateTime\": 1278124237.0", 
"      },", "      \"Steps\": [", "        {", "          \"StepConfig\": {", 
"            \"ActionOnFailure\": \"CANCEL_AND_WAIT\",", "            \"Name\": \"Example Streaming Step\",", 
"            \"HadoopJarStep\": {", "              \"MainClass\": null,", 
"              \"Jar\": \"\\/home\\/hadoop\\/contrib\\/streaming\\/hadoop-0.18-streaming.jar\",", 
"              \"Args\": [", "                \"-input\",", "                \"s3n:\\/\\/rtmpfwblrx\\/stream.txt\",", 
"                \"-output\",", "                \"s3n:\\/\\/rtmpfwblrxout\\/\",", 
"                \"-mapper\",", "                \"s3n:\\/\\/rtmpfwblrx\\/mapper.R\",", 
"                \"-reducer\",", "                \"cat\",", 
"                \"-cacheFile\",", "                \"s3n:\\/\\/rtmpfwblrx\\/emrData.RData#emrData.RData\"", 
"              ],", "              \"Properties\": []", "            }", 
"          },", "          \"ExecutionStatusDetail\": {", "            \"EndDateTime\": 1278124322.0,", 
"            \"CreationDateTime\": 1278123795.0,", "            \"LastStateChangeReason\": null,", 
"            \"State\": \"COMPLETED\",", "            \"StartDateTime\": 1278124232.0", 
"          }", "        }", "      ],", "      \"JobFlowId\": \"j-2H9P770Z4B8GG\",", 
"      \"Instances\": {", "        \"Ec2KeyName\": \"JL 09282009\",", 
"        \"InstanceCount\": 2,", "        \"Placement\": {", 
"          \"AvailabilityZone\": \"us-east-1d\"", "        },", 
"        \"KeepJobFlowAliveWhenNoSteps\": false,", "        \"SlaveInstanceType\": \"m1.small\",", 
"        \"MasterInstanceType\": \"m1.small\",", "        \"MasterPublicDnsName\": \"ec2-174-129-70-89.compute-1.amazonaws.com\",", 
"        \"MasterInstanceId\": \"i-2147b84b\",", "        \"InstanceGroups\": null,", 
"        \"HadoopVersion\": \"0.18\"", "      }", "    }", "  ]", 
"}")

score 2 · Accepted Answer

一般规则似乎是使用您认为需要的反斜杠数量的两倍（现在找不到源）。

emrJson <- gsub("\\\\", "\\", emrJson)
parser <- newJSONParser()
for (i in 1:length(emrJson)){
    parser$addData(emrJson[i])
}
parser$getObject()

在这里使用您的 dput 输出。

score 0 · Accepted Answer

我不确定它是双重转义的。请记住，您需要使用“cat”来查看字符串是什么，而不是字符串的表示形式。

regex - R：替换双转义文本

2 回答 2

Related

Reference