r - 查找匹配条件的相邻行

Question

我在 R 中有一个金融时间序列（目前是一个 xts 对象，但我现在也在研究 tibble）。

如何找到匹配条件的 2 个相邻行的概率？

例如，我想知道连续 2 天高于平均值/中值的概率。我知道我可以lag将前几天的值计入下一行，这样我就可以得到这个统计数据，但这似乎非常麻烦和不灵活。

有没有更好的方法来完成这项工作？

xts 样本数据：

foo <- xts(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days"))

连续 2 天高于median值的概率是多少？

score 1 · Accepted Answer

这是一个纯 xts 解决方案。

你如何定义中位数？有几种方法。

在在线时间序列使用中，例如计算移动平均值，您可以计算固定回溯窗口（如下所示）或从原点到现在（锚定窗口计算）的中位数。您将不会知道当前时间步之后中位数计算中的未来值（避免前瞻偏差）。：

library(xts)
library(TTR)

x <- rep(c(1,1,5,1,5,5,1, 5, 5, 5), 10)
y <- xts(x = x, seq(as.Date("2016-01-01"), length = length(x), by = "days"), dimnames = list(NULL, "x"))

# Avoid look ahead bias in an online time series application by computing the median over a rolling fixed time window:
nMedLookback <- 5
y$med <- runPercentRank(y[, "x"], n = nMedLookback)
y$isAboveMed <- y$med > 0.5

nSum <- 2
y$runSum2 <- runSum(y$isAboveMed, n = nSum)

z <- na.omit(y)
prob <- sum(z[,"runSum2"] >= nSum) / NROW(z)

您的中位数在整个数据集上的情况显然是一个更容易修改的情况。

score 1 · Accepted Answer

您可以创建一个新列，调用高于中位数的列，然后只取那些连续且更高的列

> foo <- as_tibble(data.table(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days")))

步骤1

创建列以查找高于中位数的列

> foo$higher_than_median <- foo$x > median(foo$x)

第2步

使用比较该列diff，

只有当两者都连续更高或更低时才服用它..c(0, diff(foo$higher_than_median) == 0

然后添加它们必须都更高的条件foo$higher_than_median == TRUE

完整表达：

foo$both_higher <- c(0, diff(foo$higher_than_median)) == 0 & $higher_than_median == TRUE

第 3 步

求概率取平均值foo$both_higher

mean(foo$both_higher)
[1] 0.1428571

r - 查找匹配条件的相邻行

2 回答 2

Related

Reference