我刚刚开始学习 hadoop,并使用自定义分区器和比较器运行 hadoop map-reduce 程序。我面临的问题是主要和次要排序没有在复合键上完成,而且在一个复合键的部分正在与其他 compsite-key 部分一起改变。
例如,我在映射器中创建以下键
key1 -> tagA,1
key2 -> tagA,1
key3 -> tagA,1
key4 -> tagA,1
key5 -> tagA,2
key6 -> tagA,2
key7 -> tagB,1
key8 -> tagB,1
key9 -> tagB,1
key10 -> tagB,1
key11 -> tagB,2
key12 -> tagB,2
和partitioner和combiner如下
//Partitioner
public static class TaggedJoiningPartitioner implements Partitioner<Text, Text> {
@Override
public int getPartition(Text key, Text value, int numPartitions) {
String line = key.toString();
String tokens[] = line.split(",");
return (tokens[0].hashCode() & Integer.MAX_VALUE)% numPartitions;
}
@Override
public void configure(JobConf arg0) {
// TODO Auto-generated method stub //NOT OVERRIDING THIS METHOD
}
}
//Comparator
public static class TaggedJoiningGroupingComparator extends WritableComparator {
public TaggedJoiningGroupingComparator() {
super(Text.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
String taggedKey1[] = ((Text)a).toString().split(",");
String taggedKey2[] = ((Text)b).toString().split(",");
return taggedKey1[0].compareTo(taggedKey2[0]);
}
}
在 reducer 中,这些键根据标签正确分组,但未正确排序。reducer中key的顺序和内容如下:
//REDUCER 1
key1 -> tagA,1
key2 -> tagA,1
key3 -> tagA,1
key5 -> tagA,1 //2 changed by 1 here
key6 -> tagA,1 //2 changed by 1 here
key4 -> tagA,1
//REDUCER 2
key7 -> tagB,1
key11 -> tagB,1 //2 changed by 1 here
key12 -> tagB,1 //2 changed by 1 here
key8 -> tagB,1
key9 -> tagB,1
key10 -> tagB,1
尝试了很长时间来解决它,但还没有成功,有什么帮助吗?