What is the syntax to reverse the ordering for the takeOrdered() method of an RDD in Spark?
For bonus points, what is the syntax for custom-ordering for an RDD in Spark?
What is the syntax to reverse the ordering for the takeOrdered() method of an RDD in Spark?
For bonus points, what is the syntax for custom-ordering for an RDD in Spark?
相反的顺序
val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)
结果将是 Array(9,5)
定制订单
我们将按年龄对人进行分类。
case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))
结果将是 Array(Person(ann,32))
val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))
val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))
val rdd3 = rdd2.reduceByKey((x,y) => (x+y))
//Reverse Order (Descending Order)
rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))
Output:
res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))
//Ascending Order
rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))
Output:
res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))
对于 K,V 对的字数类型问题。如果您想从订购列表中获取最后 10 个 -
SparkContext().parallelize(wordCounts.takeOrdered(10, lambda pair: -pair[1]))