我正在尝试使用 to_parquet api 中的 pyarrow 引擎将 dask 数据帧写入 hdfs parquet。
但写入失败,但有以下异常:
dask_df.to_parquet(parquet_path,engine=engine) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/core.py”,第 985 行,在 to_parquet 返回到_parquet(自我,路径,*args,**kwargs) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第 618 行,在 to_parquet out.compute() 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/base.py”,第 135 行,在计算中 (结果,) = 计算(self, traverse=False, **kwargs) 计算中的文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/base.py”,第 333 行 结果 = 获取(dsk,键,**kwargs) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py”,第 1999 行,在 get 结果 = self.gather(打包,异步=异步) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py”,第 1437 行,聚集 异步=异步) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py”,第 592 行,同步 返回同步(self.loop,func,*args,**kwargs) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/utils.py”,第 254 行,同步 六.reraise(*error[0]) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/six.py”,第 693 行,在 reraise 升值 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/utils.py”,第 238 行,在 f 结果[0] = 产量 make_coro() 运行中的文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/gen.py”,第 1055 行 价值 = 未来。结果() 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/concurrent.py”,第 238 行,结果 raise_exc_info(self._exc_info) 文件“”,第 4 行,在 raise_exc_info 运行中的文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/tornado/gen.py”,第 1063 行 产生 = self.gen.throw(*exc_info) _gather 中的文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/distributed/client.py”,第 1315 行 追溯) 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/six.py”,第 692 行,在 reraise 提高 value.with_traceback(tb) _write_partition_pyarrow 中的文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第 410 行 将 pyarrow 导入为 pa 文件“/ebs/d1/agent/miniconda3/envs/dask-distributed/lib/python3.6/site-packages/pyarrow/__init__.py”,第 113 行,在 将 pyarrow.hdfs 导入为 hdfs AttributeError:模块“pyarrow”没有属性“hdfs”
pyarrow 版本:0.8.0 和分布式版本:1.20.2
但是当我尝试在 python 控制台中导入包时,它没有任何错误:
将 pyarrow.hdfs 导入为 hdfs