compression - 算法：最小编码，纠错，请帮忙？

Question

假设有一个全为零的 1024 位数组：

示例：[0,0,0,0,0,0,0,...]

然后我在完全随机的位置用一个覆盖 20 个零：

示例：[0,1,0,0,0,0,0,...]

假设我有一个完美的编码器，对这 20 个随机放置的位的位置进行编码所需的理论最小位数是多少？

我知道有通信理论方程可以告诉我这一点，但我想仔细检查我的计算。

更难的奖励问题：向我展示实现接近此最小限制的编码的算法的代码。

额外奖励：如果位翻转到字节级别而不是位级别怎么办？例如整个字节翻转。结果一样吗？

score 5 · Accepted Answer

上限（log2（1024 选择 20））= 139 位

（在 Wolfram Alpha 上计算）

其他答案说 143 位遗漏了我们知道正好有20 个。这是一个具体的编码来展示使用该知识的一种方法：使用算术编码，连续发送 1024 个“0”或“1”符号中的每一个。第一个符号以 20/1024 的概率为“1”加权；但是后面的每个符号的权重都不同。如果第一个符号是“0”，则在下一个符号上使用 20/1023；但如果是“1”，请使用 19/1023。以同样的方式继续到最后。只要我们告诉它正确的概率，算术编码会做所有艰苦的工作以适应大约 139 位。

关于“奖金”：原始问题中没有纠错。您可以在假设没有错误的情况下首先找到最佳编码的基础上添加纠错代码（这通常是解决问题的好方法）。这样你不会失去任何编码效率，尽管我认为你可能会失去稳健性——例如，如果你得到的错误多于你的 ECC 可以纠正的错误，那么消息会以完全垃圾的形式出现，还是会更优雅地降级？

score 2 · Accepted Answer

如果您要使用解码器也有字典的基于字典的编码，则没有绝对最小值。然而，对于基于频率的编码，您需要计算熵：

E = -(P(0) * log_2(P(0)) + P(1) * log_2(P(1)))
E = -(1004/1024 * log_2(1004/1024) + 20/1024 * log_2(20/1024))
E = 0.1388005

因此，输入的每一位平均需要 0.1388005 位的输出。总共：

0.1388005 * 1024 = 142.1317 bits.

这意味着理论上，使用最佳算法，您可以使用 143 位编码具有 1004 个零和 20 个（或相反）的任何字符串。

score 1 · Accepted Answer

If you treated a string of 200 bits as an array of twenty 10 bit numbers, each listing the position of one of the one-bits, you'd be saving 824 bits.

But I don't think this is the minimum. For example, if you treat each of the numbers as relative to the previous item instead of an absolute position, some analysis may show that on average you'd only need, say, 8 bits to encode the distance to the next one-bit. So add a bit to the front: when 0, then 200 bits follow with absolute positions. When 1, then 160 bits follow with relative positions. This should yield a lower average number of bits to encode the full value.

Generalizing, this is simply data compression. There are probably many compression algorithms that could reduce the average number of bits required to encode your "twenty one-bits in 1024" to a very small number. Computing a suitable binary tree, storing a representation of it, then storing the bits required to traverse the tree would likely yield a very efficient algorithm (this is in fact the basis of modern data compression).

compression - 算法：最小编码，纠错，请帮忙？

3 回答 3

Related

Reference