3

With DataFrames, one can simply rename columns by using df.withColumnRename("oldName", "newName"). In Datasets, since every field is typed and named, this doesn't seem possible. The only work around I can think of is to use map on the Dataset:

case class Orig(a: Int, b: Int)
case class OrigRenamed(a: Int, bNewName: Int)

val origDS = Seq(Orig(1,2), Orig(3,4)).toDS
origDS.show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  3|  4|
+---+---+

// To rename with map
val origRenamedDS = origDS.map{ case Orig(x,y) => OrigRenamed(x,y) }
origRenamed.show
+---+--------+
|  a|bNewName|
+---+--------+
|  1|       2|
|  3|       4|
+---+--------+

This seems a very round-about and inefficient way just to rename a column. Is there a better way?

4

2 回答 2

4

稍微更简洁的解决方案是这样的:

origDS.toDF("a", "bNewName").as[OrigRenamed]

但实际上重命名对静态类型没有意义DatasetDataframe虽然我们使用与( )相同的列表示,但Dataset[Row]这里的语义完全不同。

列的名称对应于存储对象的特定字段,因此它不是可以动态重命名的东西。换句话说Datasets,不是静态类型DataFrames,而是对象的集合。

于 2016-08-14T14:25:03.543 回答
0

你可以让它稍微简洁一些,同时保持语义:

origDS.map(o => OrigRenamed(o.a, o.b)).show()
于 2021-09-03T09:56:56.830 回答