sql - Bigquery 宏/重复查询部分

Question

我们使用 Bigquery 来计算我们的许多日常指标，但我们也总是对长期平均值（7 天、14 天、28 天、QTD、YTD）感兴趣。

这总是这样完成的（ds：日期）：

AVG(metric_1d) OVER ( 
  ORDER BY ds 
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 
) AS metric_7d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 13 PRECEDING AND CURRENT ROW 
) AS metric_14d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 27 PRECEDING AND CURRENT ROW 
) AS metric_28d,
AVG(metric_1d) OVER (
  PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
  PARTITION BY EXTRACT(YEAR FROM ds)
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
  SELECT
    ... AS metric_1d
    ...

我不喜欢的是在所有指标查询中基本上重复相同的代码（如果计算了多个指标，有时会重复多次）。有没有推荐的方法来简化这一点，也许使用某种宏或 UDF？

score 1 · Accepted Answer

我在这里看不到宏（而不是使用脚本会进一步使代码复杂化）或 udf 的任何帮助。相反，我可以推荐 usingWINDOW子句 - 这将解决两个方面：提高代码的可读性并消除代码冗余，以防在同一窗口上使用多个度量/分析计算

所以，我会重写你的代码如下

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  avg(metric_1d) over last_14d as metric_14d,
  avg(metric_1d) over last_28d as metric_28d,
  avg(metric_1d) over qtd as metric_qtd,
  avg(metric_1d) over ytd as metric_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )

如果您要添加更多指标，例如 sum 或 count - 它就像下面一样简单

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  sum(metric_1d) over last_7d as metric2_7d,
  count(metric_1d) over last_7d as metric3_7d,
  avg(metric_1d) over last_14d as metric_14d,
  sum(metric_1d) over last_14d as metric2_14d,
  count(metric_1d) over last_14d as metric3_14d,
  avg(metric_1d) over last_28d as metric_28d,
  sum(metric_1d) over last_28d as metric2_28d,
  count(metric_1d) over last_28d as metric3_28d,
  avg(metric_1d) over qtd as metric_qtd,
  sum(metric_1d) over qtd as metric2_qtd,
  count(metric_1d) over qtd as metric3_qtd,
  avg(metric_1d) over ytd as metric_ytd,
  sum(metric_1d) over ytd as metric2_ytd,
  count(metric_1d) over ytd as metric3_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )

sql - Bigquery 宏/重复查询部分

1 回答 1

Related

Reference