3

我有一个从 HDFS 上的镶木地板文件创建的 dask 数据框。使用 api: set_index 创建设置索引时,失败并出现以下错误。

文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py”,第 64 行,在 set_index 分区、大小、分钟、最大值 = base.compute (分区、大小、分钟、最大值)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py”,第 206 行,计算结果 = get( dsk,密钥,**kwargs)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第 1949 行,在 get results = self.gather (打包,异步=异步)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第 1391 行,收集异步=异步)文件“ /ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第 561 行,同步返回同步(self.loop,func,*args, **kwargs) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第 241 行,同步 6.reraise(*error [0]) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第 693 行,在 reraise raise value 文件“/ebs/d1/agent/ conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第 229 行,在 f 结果[0] = yield make_coro() 文件“/ebs/d1/agent/conda/envs/ py361/lib/python3.6/site-packages/tornado/gen.py”,第 1055 行,运行值 = future.result() 文件“/ebs/d1/agent/conda/envs/py361/lib/python3. 6/site-packages/tornado/concurrent.py”,第 238 行,结果 raise_exc_info(self._exc_info) 文件“”,第 4 行,在 raise_exc_info 文件“/ebs/d1/agent/conda/envs/py361/lib/蟒蛇3。6/site-packages/tornado/gen.py”,第 1063 行,运行中产生 = self.gen.throw(*exc_info) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/ site-packages/distributed/client.py”,第 1269 行,在 _gather 回溯中)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第 692 行,在 reraise raise value.with_traceback(tb) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第 144 行,在_read_parquet_row_group open=open, assign=views, scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'py”,第 1269 行,在 _gather traceback 中)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第 692 行,在 reraise raise value.with_traceback( tb) _read_parquet_row_group 中的文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第 144 行 open=open,assign=views , scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'py”,第 1269 行,在 _gather traceback 中)文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第 692 行,在 reraise raise value.with_traceback( tb) _read_parquet_row_group 中的文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第 144 行 open=open,assign=views , scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'

有人可以指出这个错误的原因以及如何解决它。

4

1 回答 1

2

解决方案

将 fastparquet 升级到 0.1.3 版。

细节

用于您的示例的 Dask 0.15.4 包含此 commit,它将参数添加schemeread_row_group_file(). 对于 0.1.3 之前的 fastparquet 版本,这会引发错误。

于 2017-10-20T15:01:48.930 回答