0

我们使用 Bigquery 来计算我们的许多日常指标,但我们也总是对长期平均值(7 天、14 天、28 天、QTD、YTD)感兴趣。

这总是这样完成的(ds:日期):

AVG(metric_1d) OVER ( 
  ORDER BY ds 
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 
) AS metric_7d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 13 PRECEDING AND CURRENT ROW 
) AS metric_14d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 27 PRECEDING AND CURRENT ROW 
) AS metric_28d,
AVG(metric_1d) OVER (
  PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
  PARTITION BY EXTRACT(YEAR FROM ds)
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
  SELECT
    ... AS metric_1d
    ...

我不喜欢的是在所有指标查询中基本上重复相同的代码(如果计算了多个指标,有时会重复多次)。有没有推荐的方法来简化这一点,也许使用某种宏或 UDF?

4

1 回答 1

1

我在这里看不到宏(而不是使用脚本会进一步使代码复杂化)或 udf 的任何帮助。相反,我可以推荐 usingWINDOW子句 - 这将解决两个方面:提高代码的可读性并消除代码冗余,以防在同一窗口上使用多个度量/分析计算

所以,我会重写你的代码如下

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  avg(metric_1d) over last_14d as metric_14d,
  avg(metric_1d) over last_28d as metric_28d,
  avg(metric_1d) over qtd as metric_qtd,
  avg(metric_1d) over ytd as metric_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )         

如果您要添加更多指标,例如 sum 或 count - 它就像下面一样简单

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  sum(metric_1d) over last_7d as metric2_7d,
  count(metric_1d) over last_7d as metric3_7d,
  avg(metric_1d) over last_14d as metric_14d,
  sum(metric_1d) over last_14d as metric2_14d,
  count(metric_1d) over last_14d as metric3_14d,
  avg(metric_1d) over last_28d as metric_28d,
  sum(metric_1d) over last_28d as metric2_28d,
  count(metric_1d) over last_28d as metric3_28d,
  avg(metric_1d) over qtd as metric_qtd,
  sum(metric_1d) over qtd as metric2_qtd,
  count(metric_1d) over qtd as metric3_qtd,
  avg(metric_1d) over ytd as metric_ytd,
  sum(metric_1d) over ytd as metric2_ytd,
  count(metric_1d) over ytd as metric3_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )
于 2022-02-24T06:32:05.430 回答