apache-spark - AWS EMR Spark 正在工作节点上创建文件

翻译自：https://stackoverflow.com/questions/62526940 2020-06-23T03:38:14.093

155 次

我在 EMR 上使用 spark 来处理数据。基本上，我从 AWS S3 读取数据并进行转换和后转换，我正在将数据加载/写入到 oracle 表中。

最近我们发现 hdfs(/mnt/hdfs) 利用率过高。

我没有向 hdfs(/mnt/hdfs) 写入任何数据，但是 spark 正在创建块并将数据写入其中。我们将在内存中进行所有操作。

为什么 Spark 仍在向数据节点写入数据？

将数据写入数据节点（HDFS）的任何特定操作？

这是创建的 hdfs 目录。

*15.4G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized/subdir1

129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized

129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current

129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812

129G /mnt/hdfs/当前 129G /mnt/hdfs*

0 回答 0