我正在尝试在整个脚本周围包裹一个进度指示器。但是,set_index(..., compute=False)
它仍然在调度程序上运行任务,可在 Web 界面中观察到。
如何报告set_index
步骤的进度?
import dask.dataframe as dd
from dask.distributed import Client, progress
if __name__ == '__main__':
with Client() as client:
df = dd.read_csv('big.csv')
# I can see on the web interface that something is happening.
# This blocks 20-30s on this particular CSV.
df = df.set_index('id', compute=False)
# Progress reporting works from here
out = client.compute(
df
)
progress(out)
# out.result()
# ...