1

我有 RDD,我想遍历它。我喜欢这样:

pointsMap.foreach({ p =>
  val pointsWithCoordinatesWithDistance = pointsMap.leftOuterJoin(xCoordinatesWithDistance)
  pointsWithCoordinatesWithDistance.foreach(println)
  println("---")
})

但是,正在发生 NullPointerException:

java.lang.NullPointerException
    at org.apache.spark.rdd.RDD.<init>(RDD.scala:125)
    at org.apache.spark.rdd.CoGroupedRDD.<init>(CoGroupedRDD.scala:69)
    at org.apache.spark.rdd.PairRDDFunctions.cogroup(PairRDDFunctions.scala:651)
    at org.apache.spark.rdd.PairRDDFunctions.leftOuterJoin(PairRDDFunctions.scala:483)
    at org.apache.spark.rdd.PairRDDFunctions.leftOuterJoin(PairRDDFunctions.scala:555)
...

pointsMap和都xCoordinatesWithDistance在 foreach 之前初始化并包含元素。不在 foreach 循环内leftOuterJoin也可以。有关我的代码的完整版本,请参阅https://github.com/timasjov/spark-learning/blob/master/src/DBSCAN.scala

4

1 回答 1

3

不要在某些 RDD 运算符的函数中使用 RDD。当您想同时操作多个 RDD 时,您需要使用适当的 RDD 运算符,例如join.

于 2014-10-27T07:16:41.543 回答