Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
在猪身上,我有两个袋子。包 A 的大小约为 200 GB,包 B 的大小约为 600 GB。它们具有相同的架构。如何从包 A 中删除包 B 中包含的所有元组?我查看了 Pig 的 DIFF udf,但将两个包同时放入内存中似乎并不实际。
这是一个解决方案:
C = COGROUP A BY *, B BY *; C_FILT = FILTER C BY NOT IsEmpty(A) AND IsEmpty(B); OUT = FOREACH C_FILT GENERATE FLATTEN(A);