cascading - 级联（缓冲区）实现

Question

我需要在级联 hadoop 中创建一个缓冲区。

假设我有字段：

member_id,amountpaid,diadnosis_id,diagnosis_description,superGrouper_id,superGrouper_description,grouperId,grouperDescription

我需要

member_id对来自和的字段进行分组superGrouper_id
使用每个管道将这些信息发送到缓冲区
缓冲区输出应该是：member_id，最高支付排序superGrouper，最高支付排序grouperId，最高支付diagnosis_id，以及它们的描述......

请帮我创建一个缓冲区。提前致谢

score 1 · Accepted Answer

您不需要自定义缓冲区。使用 Cascading 的内置 Max 聚合器。级联文档

然后你只需要在 GroupBy 之后运行 Max。

pipe = new GroupBy(pipe, new Fields("member_id", "superGrouper_id");
pipe = new Every(pipe, new Fields("amountpaid"), new Max(new Fields("max_paid"));

score 0 · Accepted Answer

您可以执行以下操作：

pipe = new GroupBy(pipe, new Fields("member_id", "superGrouper_id"), new Fields("superGrouper", "grouperId", "")); 
pipe = new Every(pipe, FirstNBuffer(int n));

如果我错了，我很抱歉。你的问题不是很清楚。

cascading - 级联（缓冲区）实现

2 回答 2

Related

Reference