4

I can collect a column like this using the RDD API.

df.map(r => r.getAs[String]("column")).collect

However, as I am initially using a Dataset I rather would like to not switch the API level. A simple df.select("column).collect returns an Array[Row] where the .flatten operator no longer works. How can I collect to Array[T e.g. String] directly?

4

1 回答 1

13

使用数据集(Spark 版本 >= 2.0.0),您只需将数据帧转换为数据集,然后收集它。

df.select("column").as[String].collect()

会给你一个 Array[String]

于 2016-11-22T21:09:12.507 回答