scala - Spark 中的数组数据集 (1.6.1)

Question

因此，我一直在尝试重新格式化我正在处理的项目以使用 Dataset API，并且遇到了一些编码错误问题。根据我的阅读，我认为我应该能够将原始值数组存储在数据集中。但是，以下类给了我编码错误：

case class InvertedIndex(partition:Int, docs:Array[Int], indices:Array[Long], weights:Array[Double])

val inv = RDD[InvertedIndex]
val invertedIndexDataset = sqlContext.createDataset(inv)
invertedIndexDataset.groupBy(x => x.partition).mapGroups {
    //...
}

有人可以帮我理解这里的问题吗？数据集目前不能处理基元数组，还是我需要做一些额外的事情才能使它们工作？

谢谢

编辑1：

这是我得到的完整错误

Error:(223, 84) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    val similarities = invertedIndexDataset.groupByKey(x => x.partition).mapGroups {

score 0 · Accepted Answer

以下在 Spark 2.0 中按预期工作。

import spark.implicits._

spark.createDataset( Array(1,2) :: Array(1) :: Array(2) :: Nil )
res0:org.apache.spark.sql.Dataset[Array[Int]] = [value: array<int>]

scala - Spark 中的数组数据集 (1.6.1)

1 回答 1

Related

Reference