c - 如何在c中进行位集/字节数组转换

Question

给定一个数组， unsigned char q[32]="1100111...",

我怎样才能生成一个 4 字节的位集，unsigned char p[4]这样，这个位集的位等于数组内的值，例如，第一个字节 p[0]= "q[0] ... q [7]”；第二个字节 p[1]="q[8] ... q[15]" 等。

以及如何在相反的情况下进行操作，即给定位集，生成数组？

我自己的第一部分试用。

unsigned char p[4]={0};
for (int j=0; j<N; j++) 
{
    if (q[j] == '1')
    {
        p [j / 8] |= 1 << (7-(j % 8)); 
    }            
}

以上是对的吗？有什么条件可以检查？有没有更好的办法？

编辑 - 1

我想知道以上是否是有效的方法？因为数组大小可能高达 4096 甚至更多。

score 3 · Accepted Answer

首先，strtoul用于获取 32 位值。然后使用 . 将字节顺序转换为大端序htonl。最后，将结果存储在您的数组中：

#include <arpa/inet.h>
#include <stdlib.h>

/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

还有其他方法。

但我缺`<arpa/inet.h>`！

然后你需要知道你的平台是什么字节顺序。如果它是大端，则htonl什么都不做，可以省略。如果它是 little-endian，那么htonl就是：

unsigned long htonl(unsigned long x)
{
    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

如果你幸运的话，你的优化器可能会看到你在做什么并将其转化为高效的代码。如果不是，那么至少它可以在寄存器和 O(log N) 中实现。

如果您不知道您的平台是什么字节顺序，那么您需要检测它：

typedef union {
    char c[sizeof(int) / sizeof(char)];
    int i;
} OrderTest;

unsigned long htonl(unsigned long x)
{
    OrderTest test;
    test.i = 1;
    if(!test.c[0])
        return x;

    x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
    x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
    return x;
}

也许`long`是8个字节！

好吧，OP 暗示 4 字节输入及其数组大小，但 8 字节long是可行的：

#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));

unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
    x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
    x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
    x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
    x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
    x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
    return x;
}

因为char那不是 8 位（DSP 喜欢这样做），所以您只能靠自己。（这就是为什么当 SHARC 系列 DSP 有 8 位字节时它是一件大事；它使移植现有代码变得容易得多，因为面对现实，C 在可移植性支持方面做得很糟糕。）

任意长度的缓冲区呢？请不要有有趣的指针类型转换。

OP 版本可以改进的主要内容是重新考虑循环的内部结构。与其将输出字节视为固定数据寄存器，不如将其视为移位寄存器，其中每个连续的位都被移到右 (LSB) 端。这将使您免于所有这些部门和模块（希望它们被优化为位移位）。

为了理智，我放弃unsigned char了uint8_t。

#include <stdint.h>

unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
    size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
 * bytes in `outBytes`. `numBytes` is the number of available bytes in the
 * `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
 * to is set to the number of bytes read (rounding up to the nearest full
 * byte). If a multiple of 8 bits is not read, the last byte written will be
 * padded with 0 bits to reach a multiple of 8 bits. This function returns the
 * number of padding bits that were added. For example, an input of 11 bits
 * will result `bytesRead` being set to 2 and the function will return 5. This
 * means that if a nonzero value is returned, then a partial byte was read,
 * which may be an error.
 */
{   size_t bytes = 0;
    unsigned bits = 0;
    uint8_t x = 0;

    while(bytes < numBytes)
    {   /* Parse a character. */
        switch(*inChars++)
        {   '0': x <<= 1; ++bits; break;
            '1': x = (x << 1) | 1; ++bits; break;
            default: numBytes = 0;
        }

        /* See if we filled a byte. */
        if(bits == 8)
        {   outBytes[bytes++] = x;
            x = 0;
            bits = 0;
        }
    }

    /* Padding, if needed. */
    if(bits)
    {   bits = 8 - bits;
        outBytes[bytes++] = x << bits;
    }

    /* Finish up. */
    if(bytesRead)
        *bytesRead = bytes;
    return bits;
}

你有责任确保inChars它是空终止的。该函数将返回它看到的第一个非字符'0'或'1'字符，或者如果它用完输出缓冲区。一些示例用法：

unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
    /* Partial read; handle error here. */
}

这只是读取 4 个字节，如果不能，则捕获错误。

unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);

这只是转换它可以转换的并将其余的设置为 0 位。

break如果 C 能够跳出多于一级的循环或switch; ，则此功能可以做得更好。就目前而言，我必须添加一个标志值才能获得相同的效果，这很混乱，或者我必须添加一个goto，我只是拒绝。

score 2 · Accepted Answer

我认为这不会奏效。您正在将每个“位”与1真正应该的时间进行比较'1'。您还可以通过摆脱if：

unsigned char p[4]={0};
for (int j=0; j<32; j++) 
{
    p [j / 8] |= (q[j] == `1`) << (7-(j % 8));           
}

反过来也很简单。只需为您之前设置的每个“位”进行掩码。

unsigned char q[32]={0};
for (int j=0; j<32; j++) {
  q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}

您会注意到(boolean) + '0'在 1/0 和 '1'/'0' 之间转换的创造性用途。

score 1 · Accepted Answer

根据您的示例，您似乎并不打算提高可读性，并且在（后期）刷新后，我的解决方案看起来与 Chriszuma 非常相似，只是由于操作顺序和添加了 !! 强制执行 0 或 1。

const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed

for(size_t i = 0; i < N; ++i)
    p[i / 8] |= (q[i] == '1') << 7 - i % 8;

for(size_t i = 0; i < N; ++i)
    r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);

printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);

score 1 · Accepted Answer

如果您正在寻找极高的效率，请尝试使用以下技术：

if通过减法替换'0'（似乎您可以假设您的输入符号只能是0或1）。还要处理从较低索引到较高索引的输入。

for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + q[c + b] - '0';
    p[c / 8] = y;
}

用自增指针替换数组索引：

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    int y = 0;
    for (int b = 0; b < 8; ++b)
        y = y * 2 + *qptr++ - '0';
    *pptr++ = y;
}

展开内循环：

const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
    *pptr++ =
        qptr[0] - '0' << 7 |
        qptr[1] - '0' << 6 |
        qptr[2] - '0' << 5 |
        qptr[3] - '0' << 4 |
        qptr[4] - '0' << 3 |
        qptr[5] - '0' << 2 |
        qptr[6] - '0' << 1 |
        qptr[7] - '0' << 0;
    qptr += 8;
}

同时处理多个输入字符（使用位旋转技巧或 MMX 指令） - 这具有很大的加速潜力！

c - 如何在c中进行位集/字节数组转换

4 回答 4

但我缺<arpa/inet.h>！

也许long是8个字节！

任意长度的缓冲区呢？请不要有有趣的指针类型转换。

Related

Reference

但我缺`<arpa/inet.h>`！

也许`long`是8个字节！