我想计算不同拆分的事务之间的平均延迟。我已经有了解决方案,但我需要通过不同的方法计算延迟。
数据集如下所示:
customer_id transaction_date type sign period
A 01/01/15 A C 30 days
A 05/01/15 A C 30 days
A 10/01/15 B D 30 days
A 25/01/15 B D 30 days
transaction_data = structure(list(customer_id = c("A", "A", "A", "A"),
transaction_date = c("01/01/15",
"05/01/15", "10/01/15", "25/01/15"), type = c("A", "A", "B",
"B"), sign = c("C", "C", "D", "D"), period = c("30 days", "30 days",
"30 days", "30 days")), .Names = c("customer_id", "transaction_date",
"type", "sign", "period"), row.names = c(NA, -4L), class = "data.frame")
解决老方法
我以前做的是先计算后续事务之间的延迟,像这样:
# Delay between subseauent transactions
library(data.table)
setDT(transaction_data)[,delay_in_transactions_days:= c(0, diff.Date(transaction_date)), .(customer_id)]
# Convert seconds to days
transaction_data <- mutate(transaction_data, delay_in_days = delay_in_transactions_days/86400)
# Convert to integer
transaction_data$delay_in_days <- as.integer(transaction_data$delay_in_days)
然后通过 dcast 计算每个事务延迟的每个拆分的平均值:
dcast(setDT(transaction_data), customer_id ~ paste0("avg_delay_",period), value.var = "delay_in_days", mean)
问题新方法
我想用来计算延迟的新方法是通过以下等式:
对于每个客户:( 最新交易 - 第一次交易)/(交易数量 - 1)
当然,问题是不能按周期计算延迟,因为这将是所有交易的延迟。相反,它需要计算为特定类型或符号或拆分组合的每个周期的延迟。
有什么想法可以解决这个问题吗?
预期产出
customer_id av.delay_30days av.delay_30_days_TYPE_A av.delay_30_days_TYPE_B
A 8 4 15