With DataFrames, one can simply rename columns by using df.withColumnRename("oldName", "newName")
. In Datasets, since every field is typed and named, this doesn't seem possible. The only work around I can think of is to use map
on the Dataset:
case class Orig(a: Int, b: Int)
case class OrigRenamed(a: Int, bNewName: Int)
val origDS = Seq(Orig(1,2), Orig(3,4)).toDS
origDS.show
+---+---+
| a| b|
+---+---+
| 1| 2|
| 3| 4|
+---+---+
// To rename with map
val origRenamedDS = origDS.map{ case Orig(x,y) => OrigRenamed(x,y) }
origRenamed.show
+---+--------+
| a|bNewName|
+---+--------+
| 1| 2|
| 3| 4|
+---+--------+
This seems a very round-about and inefficient way just to rename a column. Is there a better way?