scala - Strange behaviour of Scala for-comprehension and json4s

Question

The code below should:

iterate over a sequence of strings
parse each one as json,
filter out fields whose names could not be used as an identifier in most languages
lowercase the rmaining names
serialize the result as a string

It behaves as expected on small tests, but on an 8.6M item sequence of live data the output sequence is significantly longer than the input sequence:

import org.json4s._
import org.json4s.jackson.JsonMethods._
import org.apache.spark._

val txt = sc.textFile("s3n://...")
val patt="""^[a-zA-Z]\w*$""".r.findFirstIn _
val json = (for {
         line <- txt
         JObject(children) <- parse(line)
         children2 = (for {
           JField(name, value) <- children

           // filter fields with invalid names
           // patt(name) returns Option[String]
           _ <- patt(name)

         } yield JField(name.toLowerCase, value))
} yield compact(render(JObject(children2))))

I have checked that it actually increases the number of unique items, so it is not just duplicating items. Given my understanding of Scala comprehensions & json4s, I do not see how this is possible. The large live data collection is a Spark RDD, while my tests were with an ordinary Scala Seq, but that should not make any difference.

How can json have more elements than txt in the above code?

score 1 · Accepted Answer

1

也许parse(line)为一行返回多个 JSON 对象？

于 2014-10-27T12:23:08.513 回答

score 1 · Accepted Answer

我不知道

JObject(children) <- parse(line)

在的结果中递归匹配parse。因此，即使parse返回单个值，当有嵌套对象时，它们也会作为单独的绑定返回children。答案是使用

JObject(children) = parse(line)

正确的代码是：

import org.json4s._
import org.json4s.jackson.JsonMethods._
import org.apache.spark._

val txt = sc.textFile("s3n://...")
val patt="""^[a-zA-Z]\w*$""".r.findFirstIn _
val json = (for {
         line <- txt
         JObject(children) = parse(line) // CHANGED <- TO =
         children2 = (for {
           JField(name, value) <- children

           // filter fields with invalid names
           // patt(name) returns Option[String]
           _ <- patt(name)

         } yield JField(name.toLowerCase, value))
} yield compact(render(JObject(children2))))

scala - Strange behaviour of Scala for-comprehension and json4s

2 回答 2

Related

Reference