0

I have a problem. I have a Spark RDD that I have to store inside an HBase table. We use the Apache-phoenix layer to dialog with the database. There a column of the table that is defined as an UNSIGNED_SMALLINT ARRAY:

CREATE TABLE EXAMPLE (..., Col10 UNSIGNED_SMALLINT ARRAY, ...);

As stated in the Phoenix documentation, that you can fine here, ARRAY data type is backend up by the java.sql.Array.

I'm using the phoenix-spark plugin to save data of the RDD inside the table. The problem is that I don't know how to create an instance of java.sql.Array, not having any kind of Connection object. An extract of the code follows (code is in Scala language):

// Map RDD into an RDD of sequences or tuples
rdd.map {
  value =>
    (/* ... */
     value.getArray(),   // Array of Int to convert into an java.sql.Array
     /* ... */
    )
}.saveToPhoenix("EXAMPLE", Seq(/* ... */, "Col10", /* ... */), conf, zkUrl)

Which is the correct way of go on? Is there a way to do want I need?

4

1 回答 1

0

凤凰城的人已经通过电子邮件回答了上述问题。我报告答案,为即将到来的人留下智慧。

要保存数组,您可以使用普通的旧 scala 数组类型。您可以查看测试示例: https ://github.com/apache/phoenix/blob/master/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L408-L427

请注意,仅在 Phoenix 4.5.0中支持保存数组,尽管如果您需要自己应用该补丁非常小: https ://issues.apache.org/jira/browse/PHOENIX-1968

不错的答案。感谢凤凰城的伙计们。

于 2015-07-30T15:29:38.680 回答