我试图从这个数据集中提取一些关联规则:
49
70
27,66
6
27
66,8,64
32
82
66
71
44
1
33
17
31,83
50,29
22
72
8
8,16
56
83,61
85,63,37
50,57
2
50
96,6
73
57
12
62
96
3
47,50,73
35
85,45
25,96,22,17
85
24
17,57
34,4
60,96,45
25
85,66,73
30
14
73,85
64
48
5
37
13,55
37,17
我有这个代码:
val transactions = sc.textFile("/user/cloudera/dataset1")
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = transactions.flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).map(x => (x.toList, 1L))
).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
val ar = new AssociationRules().setMinConfidence(0.4)
val results = ar.run(freqItemsets)
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}
但是我的输出中出现了一些意想不到的行:
[2,9=>5],0.5
[8,5,,,3=>6],1.0
[8,5,,,3=>7],0.5
[8,5,,,3=>7],0.5
[,,,=>6],0.5
[,,,=>7],0.5
[,,,=>5],0.5
[,,,=>3],0.5
[4,3=>7],1.0
[4,3=>,,,],1.0
[4,3=>,,,],1.0
[4,3=>5],1.0
[4,3=>7,7],1.0
[4,3=>7,7],1.0
[4,3=>0],1.0
为什么我得到这样的输出:
[,,,=>3],0.5
我不明白这个问题......有人知道如何解决这个问题吗?
非常感谢!