1

我正在尝试学习火花数据集(火花 2.0.1)。左外连接下方正在创建空指针异常。

case class Employee(name: String, age: Int, departmentId: Int, salary: Double)
case class Department(id: Int, depname: String)
case class Record(name: String, age: Int, salary: Double, departmentId: Int, departmentName: String)
val employeeDataSet = sc.parallelize(Seq(Employee("Jax", 22, 5, 100000.0),Employee("Max", 22, 1, 100000.0))).toDS()
val departmentDataSet = sc.parallelize(Seq(Department(1, "Engineering"), Department(2, "Marketing"))).toDS()

val averageSalaryDataset = employeeDataset.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")
                               .map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , record._2.depname))

averageSalaryDataset.show()

2014 年 16 月 12 日 16:48:26 错误执行程序:阶段 2.0 (TID 12) 中任务 0.0 中的异常 java.lang.NullPointerException

这是因为在进行左外连接时,它为 record._2.depname 提供了空值。

如何处理?谢谢

4

2 回答 2

1

通过使用解决了这个问题---

val averageSalaryDataset1 = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").selectExpr("nvl(_1.name, ' ') as name","nvl(_1.age, 0) as age","nvl(_1.salary, 0.0D) as salary","nvl(_1.departmentId, 0) as departmentId","nvl(_2.depname, ' ') as departmentName").as[Record]
averageSalaryDataset1.show()
于 2016-12-14T18:39:49.270 回答
0

可以使用 if..else 条件处理 null。

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet,   $"departmentId" === $"id", "left_outer").map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , if (record._2 == null) null else record._2.depname ))

在连接操作之后,生成的数据集列存储为 Map(key-value pairs) ,在 map 操作中,我们调用键,但当您调用 record._2.depName 时键为“null”,这就是为什么例外。

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet,   $"departmentId" === $"id", "left_outer")

左连接后的数据集

于 2017-11-29T09:08:01.467 回答