apache-spark - Bluemix 上的 Spark Context 将 null 添加到 json 有效负载

Question

我正在将消息从 Message Hub 流式传输到 Bluemix 中的 Spark 实例。我正在使用 java 客户端将简单的 json 消息发送到 Message Hub。

JSON 消息 -

{"country":"Netherlands","dma_code":"0","timezone":"Europe\/Amsterdam","area_code":"0","ip":"46.19.37.108","asn":"AS196752","continent_code":"EU","isp":"Tilaa V.O.F.","longitude":5.75,"latitude":52.5,"country_code":"NL","country_code3":"NLD"}

当我开始在 Spark 中进行流式传输时，我收到的消息的开头有一个额外的 null。

(null,{"country":"Netherlands","dma_code":"0","timezone":"Europe\/Amsterdam","area_code":"0","ip":"46.19.37.108","asn":"AS196752","continent_code":"EU","isp":"Tilaa V.O.F.","longitude":5.75,"latitude":52.5,"country_code":"NL","country_code3":"NLD"})

请让我知道为什么 Spark 上下文将这个 null 放在前面。我怎样才能删除它？

KafkaSender 代码 -

  KafkaProducer<String, String> kafkaProducer;
  kafkaProducer = new KafkaProducer<String, String>(props);
  ProducerRecord<String, String> producerRecord = new ProducerRecord<String, String>(topic,message);

  RecordMetadata recordMetadata = kafkaProducer.send(producerRecord).get();
  //getting RecordMetadata is possible to validate topic, partition and offset
  System.out.println("topic where message is published : " + recordMetadata.topic());
  System.out.println("partition where message is published : " + recordMetadata.partition());
  System.out.println("message offset # : " + recordMetadata.offset());
  kafkaProducer.close();

谢谢拉吉

score 0 · Accepted Answer

您的密钥为空 - 第一个值是您的密钥，第二个当然是您的值。

我建议您发布将消息发布到 Kafka/MessageHub 的代码以获得更好的答案。

为了解决您的问题 - 如果您的目标只是将其打印出来，您可以执行类似的操作，它将数据打印到标准输出并忽略 null 键。

stream.foreachRDD(recordRDD => {
  recordRDD.foreach(record => print(record._2))
})

apache-spark - Bluemix 上的 Spark Context 将 null 添加到 json 有效负载

1 回答 1

Related

Reference