我正在使用 Amazon Elastic Map Reduce 命令行工具将多个系统调用粘合在一起。这些命令返回已经(部分?)转义的 JSON 文本。然后,当系统调用将其转换为 R 文本对象 (intern=T) 时,它似乎又被转义了。我需要清理它,以便它使用 rjson 包进行解析。
我以这种方式进行系统调用:
system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T)
返回:
[1] "{"
[2] " \"JobFlows\": ["
[3] " {"
[4] " \"LogUri\": \"s3n:\\/\\/emrlogs\\/\","
[5] " \"Name\": \"emrFromR\","
[6] " \"BootstrapActions\": ["
...
但命令行中的相同命令返回:
{
"JobFlows": [
{
"LogUri": "s3n:\/\/emrlogs\/",
"Name": "emrFromR",
"BootstrapActions": [
{
"BootstrapActionConfig": {
...
如果我尝试通过 rjson 运行系统调用的结果,我会得到这个错误:
Error: '\/' is an unrecognized escape in character string starting "s3n:\/"
我相信这是因为 s3n 行中的双重转义。我正在努力将这个文本按摩成可以解析的东西。
它可能就像用“\”替换“\\”一样简单,但由于我有点挣扎于正则表达式和转义,我无法正确完成。
那么如何获取字符串向量并将任何出现的“\\”替换为“\”?(即使要输入这个问题,我也必须使用三个反斜杠来表示两个)与此特定用例相关的任何其他提示?
这是我更详细的代码:
> library(rjson)
> emrJson <- paste(system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T))
>
> parser <- newJSONParser()
> for (i in 1:length(emrJson)){
+ parser$addData(emrJson[i])
+ }
>
> parser$getObject()
Error: '\/' is an unrecognized escape in character string starting "s3n:\/"
如果您渴望重新创建 emrJson 对象,这里是 dput() 输出:
> dput(emrJson)
c("{", " \"JobFlows\": [", " {", " \"LogUri\": \"s3n:\\/\\/emrlogs\\/\",",
" \"Name\": \"emrFromR\",", " \"BootstrapActions\": [",
" {", " \"BootstrapActionConfig\": {", " \"Name\": \"Bootstrap 0\",",
" \"ScriptBootstrapAction\": {", " \"Path\": \"s3:\\/\\/rtmpfwblrx\\/bootstrap.sh\",",
" \"Args\": []", " }", " }",
" }", " ],", " \"ExecutionStatusDetail\": {",
" \"EndDateTime\": 1278124414.0,", " \"CreationDateTime\": 1278123795.0,",
" \"LastStateChangeReason\": \"Steps completed\",", " \"State\": \"COMPLETED\",",
" \"StartDateTime\": 1278124000.0,", " \"ReadyDateTime\": 1278124237.0",
" },", " \"Steps\": [", " {", " \"StepConfig\": {",
" \"ActionOnFailure\": \"CANCEL_AND_WAIT\",", " \"Name\": \"Example Streaming Step\",",
" \"HadoopJarStep\": {", " \"MainClass\": null,",
" \"Jar\": \"\\/home\\/hadoop\\/contrib\\/streaming\\/hadoop-0.18-streaming.jar\",",
" \"Args\": [", " \"-input\",", " \"s3n:\\/\\/rtmpfwblrx\\/stream.txt\",",
" \"-output\",", " \"s3n:\\/\\/rtmpfwblrxout\\/\",",
" \"-mapper\",", " \"s3n:\\/\\/rtmpfwblrx\\/mapper.R\",",
" \"-reducer\",", " \"cat\",",
" \"-cacheFile\",", " \"s3n:\\/\\/rtmpfwblrx\\/emrData.RData#emrData.RData\"",
" ],", " \"Properties\": []", " }",
" },", " \"ExecutionStatusDetail\": {", " \"EndDateTime\": 1278124322.0,",
" \"CreationDateTime\": 1278123795.0,", " \"LastStateChangeReason\": null,",
" \"State\": \"COMPLETED\",", " \"StartDateTime\": 1278124232.0",
" }", " }", " ],", " \"JobFlowId\": \"j-2H9P770Z4B8GG\",",
" \"Instances\": {", " \"Ec2KeyName\": \"JL 09282009\",",
" \"InstanceCount\": 2,", " \"Placement\": {",
" \"AvailabilityZone\": \"us-east-1d\"", " },",
" \"KeepJobFlowAliveWhenNoSteps\": false,", " \"SlaveInstanceType\": \"m1.small\",",
" \"MasterInstanceType\": \"m1.small\",", " \"MasterPublicDnsName\": \"ec2-174-129-70-89.compute-1.amazonaws.com\",",
" \"MasterInstanceId\": \"i-2147b84b\",", " \"InstanceGroups\": null,",
" \"HadoopVersion\": \"0.18\"", " }", " }", " ]",
"}")