python - 如何解析列表列表并一起分析元素以查看它们随时间发生的次数？

Question

假设我有一台每秒发送 4 位的机器，我想查看某个位签名随时间发送的次数。

我得到一个列表的输入列表，其中包含随时间变化的位消息。

对于我的输出，我想要一个字典列表，每个位对，包含唯一位对作为键和它作为值出现的时间。

编辑新示例：

例如，以下数据集将是该数据的表示。水平轴是位位置，垂直轴是随时间变化的样本。因此，对于以下示例，我总共有 4 个位和 6 个样本。

a = [
    [0, 0, 1, 1],
    [0, 1, 1, 1],
    [1, 1, 1, 1],
    [1, 1, 1, 1],
    [0, 0, 0, 0],
    [1, 0, 1, 0]])

对于这个数据集，我试图计算某个位串出现的次数，这个长度应该能够变化，但是对于这个例子，假设我一次做 2 位。

因此，第一个样本 [0,0,1,1] 将被拆分为 [00,01,11]，第二个样本为 [01,11,11]，第三个样本为 [11,11,11] 和很快。生成这样的列表：

y = [
    [00,01,11],
    [01,11,11],
    [11,11,11],
    [11,11,11],
    [00,00,00],
    [10,01,10]]

由此，我希望能够计算每个唯一签名并生成一个字典，其中的键对应于签名，值对应于计数。

字典会这样

z = [
    {'00':2, '01':1, '11':2, '10':1}, 
    {'00':1, '01':2, '11':3},
    {'00':1, '11':4], '10':1}]

如果有一个已解析项目的列表，则查找计数很容易。但是，从原始数据到解析列表是我目前遇到的一些麻烦。我有一个实现，但它本质上是 3 个 for 循环，并且在大型数据集上运行速度非常慢。当然有更好，更蟒蛇的方式来解决这个问题？

我稍后在我的程序中使用 numpy 进行一些额外的计算，所以我不会反对在这里使用它。

更新：我一直在环顾其他事情并来到这里。也不确定这是否是最好的解决方案。

import numpy as np
a = np.array([
    [0, 0, 1, 1],
    [0, 1, 1, 1],
    [1, 1, 1, 1]])

my_list = a.astype(str).tolist()


# How many elements each
# list should have
n = 2

# using list comprehension
final = [([''.join(c[i:(i) + n]) for i in range((len(c) + n) // n)]) for c in my_list]

final = [['00', '01', '11'], ['01', '11', '11'], ['11', '11', '11']]

更新 2：

我已经运行了以下实现并测试了那里的速度，这就是我想出的。

在宽度为 2 的 4 位和 4 个样本的小示例上运行数据。

x = [
    [0,0,1,1],
    [0,1,1,1],
    [1,1,1,1]]

我的实现花了0.0003 秒
Kasrâmvd 的实现耗时0.0002 秒
Chris 的实现耗时0.0002 秒
Paul 的实现耗时0.0243 秒

但是，当针对 64 位和 23,497 个宽度为 2 的样本的实际数据集运行时。我得到了以下结果：

我的实现花了1.5302 秒
Kasrâmvd 的实现耗时0.3913 秒
Chris 的实现耗时2.0802 秒
Paul 的实现耗时0.0204 秒

score 1 · Accepted Answer

该解决方案不会将位配对，而是将它们作为元组提供（尽管这应该足够简单）。

编辑：根据需要形成位串。

from collections import Counter

x = [[0,0,1,1],
      [0,1,1,1],
      [1,1,1,1]]



y = [[''.join(map(str, ref[j:j+2])) for j in range(len(x[0])-1)] \
     for ref in x]

for bit in y:
    d = Counter(bit)
    print(d)

印刷

Counter({'00': 1, '01': 1, '11': 1})
Counter({'11': 2, '01': 1})
Counter({'11': 3})

编辑：要将窗口从 2 增加到 3，您可以将其添加到您的代码中：

window = 3
offset = window - 1

y = [[''.join(map(str, ref[j:j+window])) for j in range(len(x[0])-offset)] \
     for ref in x]

score 1 · Accepted Answer

如果您想进行几何或代数分析/解决方案，您可以执行以下操作：

In [108]: x = np.array([[0,0,1,1],
     ...:       [0,1,1,1],
     ...:       [1,1,1,1]])
     ...:       

In [109]: 

In [109]: pairs = np.dstack((x[:, :-1], x[:, 1:]))

In [110]: x, y, z = pairs.shape

In [111]: uniques
Out[111]: 
array([[0, 0],
       [0, 1],
       [1, 1]])

In [112]: uniques = np.unique(pairs.reshape(x*y, z), axis=0)

# None: 3d broadcasting is not recommended in any situation, please read doc for more details,
In [113]: R = (uniques[:,None][:,None,:] == pairs).all(3).sum(-1)

In [114]: R
Out[114]: 
array([[1, 0, 0],
       [1, 1, 0],
       [1, 2, 3]])

矩阵的列代表原始数组每一行R中对象中每个唯一对的计数。uniques

然后，您可以获得一个 Python 对象，如下所示：

In [116]: [{tuple(i): j for i,j in zip(uniques, i) if j} for i in R.T]
Out[116]: [{(0, 0): 1, (0, 1): 1, (1, 1): 1}, {(0, 1): 1, (1, 1): 2}, {(1, 1): 3}]

score 1 · Accepted Answer

这是一种使用卷积的方法。由于快速卷积依赖于 FFT，因此需要使用浮点数进行计算，所以我们有 52 位尾数，而 53 是我们可以处理的最大模式长度。

import itertools as it
import numpy as np
import scipy.signal as ss

MAX_BITS = np.finfo(float).nmant + 1

def sliding_window(data, width, return_keys=True, return_dict=True, prune_empty=True):
    n, m = data.shape
    if width > MAX_BITS:
        raise ValueError(f"max window width is {MAX_BITS}")
    patterns = ss.convolve(data, 1<<np.arange(width)[None], 'valid', 'auto').astype(int)
    patterns += np.arange(m-width+1)*(1<<width)
    cnts = np.bincount(patterns.ravel(), None, (m-width+1)*(1<<width)).reshape(m-width+1,-1)
    if return_keys or return_dict:
        keys = np.array([*map("".join, it.product(*width*("01",)))], 'S')
        if return_dict:
            dt = np.dtype([('key', f'S{width}'), ('value', int)])
            aux = np.empty(cnts.shape, dt)
            aux['value'] = cnts
            aux['key'] = keys
            if prune_empty:
                i,j = np.where(cnts)
                return [*map(dict, np.split(aux[i,j],
                                            i.searchsorted(np.arange(1,m-width+1))))]
            return [*map(dict, aux.tolist())]
        return keys, cnts
    return cnts

example = np.random.randint(0, 2, (10,10))
print(example)
print(sliding_window(example,3))

样品运行：

[[0 1 1 1 0 1 1 1 1 1]
 [0 0 1 0 1 0 0 1 0 1]
 [0 0 1 0 1 1 1 0 1 1]
 [1 1 1 1 1 0 0 0 1 0]
 [0 0 0 0 1 1 1 0 0 0]
 [1 1 0 0 0 1 0 0 1 1]
 [0 1 1 1 0 1 1 1 1 1]
 [0 1 0 0 0 1 1 0 0 1]
 [1 0 1 1 0 1 1 0 1 0]
 [0 0 1 1 0 1 0 1 0 0]]
[{b'000': 1, b'001': 3, b'010': 1, b'011': 2, b'101': 1, b'110': 1, b'111': 1}, {b'000': 1, b'010': 2, b'011': 2, b'100': 2, b'111': 3}, {b'000': 2, b'001': 1, b'101': 2, b'110': 4, b'111': 1}, {b'001': 2, b'010': 1, b'011': 2, b'101': 4, b'110': 1}, {b'010': 2, b'011': 4, b'100': 2, b'111': 2}, {b'000': 1, b'001': 1, b'100': 1, b'101': 1, b'110': 4, b'111': 2}, {b'001': 2, b'010': 2, b'100': 2, b'101': 2, b'111': 2}, {b'000': 1, b'001': 1, b'010': 2, b'011': 2, b'100': 1, b'101': 1, b'111': 2}]

python - 如何解析列表列表并一起分析元素以查看它们随时间发生的次数？

3 回答 3

Related

Reference