12

我有一长串介于 0 和 67600 之间的数字。现在我想使用一个长度为 67600 个元素的数组来存储它们。如果数字在集合中,则元素设置为 1,如果数字不在集合中,则元素设置为 0。IE。每次我只需要 1 位信息来存储数字的存在。C/C++ 中是否有任何 hack 可以帮助我实现这一目标?

4

5 回答 5

15

In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!

In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.

There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.

Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)

于 2013-03-18T18:18:51.100 回答
11

You should use std::bitset.

std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.

Another option is vector<bool>, which I don't recommend because:

  • It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
  • That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.

*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.

于 2013-03-18T18:19:49.040 回答
6

There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool

See the doc, it stores it as efficiently as possible.

Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset

于 2013-03-18T18:19:15.087 回答
4

如果你想用 C 语言编写它,有一个长度为 67601 位(67601/8 = 8451)的 char 数组,然后为每个值打开/关闭相应的位。

于 2013-03-18T18:23:06.590 回答
1

其他人给出了正确的想法。这是我自己bitsarr的位或“数组”的实现。无符号字符是一个字节,因此它本质上是一个无符号字符数组,将信息存储在各个位中。除了一位值之外,我还添加了存储两位或四位值的选项,因为它们都除以 8(一个字节的大小),如果你想存储大量从 0 开始的整数,这将很有用-3 或 0-15。

设置和获取时,数学是在函数中完成的,所以你可以给它一个索引,就好像它是一个普通的数组一样——它知道去哪里找。

此外,用户有责任不要传递一个太大的值来设置,否则会搞砸其他值。可以对其进行修改,使溢出循环回到 0,但这只会使它更加复杂,所以我决定相信自己。

#include<stdio.h>
#include <stdlib.h>
#define BYTE 8

typedef enum {ONE=1, TWO=2, FOUR=4} numbits;

typedef struct bitsarr{
    unsigned char* buckets;
    numbits n;
} bitsarr;


bitsarr new_bitsarr(int size, numbits n)
{
    int b = sizeof(unsigned char)*BYTE;
    int numbuckets = (size*n + b - 1)/b;
    bitsarr ret;  
    ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
    ret.n = n;
    return ret;
}
void bitsarr_delete(bitsarr xp)
{
    free(xp.buckets);
}

void bitsarr_set(bitsarr *xp, int index, int value)
{
    int buckdex, innerdex;
    buckdex = index/(BYTE/xp->n);
    innerdex = index%(BYTE/xp->n);
    xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);

    //longer version

    /*unsigned int width, width_in_place, zeros, old, newbits, new;
    width = (1 << xp->n) - 1; 
    width_in_place = width << innerdex*xp->n;
    zeros = ~width_in_place;
    old = xp->buckets[buckdex];
    old = old & zeros;
    newbits = value << innerdex*xp->n;
    new = newbits | old;
    xp->buckets[buckdex] = new; */

}

int bitsarr_get(bitsarr *xp, int index)
{
    int buckdex, innerdex;
    buckdex = index/(BYTE/xp->n);
    innerdex = index%(BYTE/xp->n);
    return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;

    //longer version

    /*unsigned int width = (1 << xp->n) - 1; 
    unsigned int width_in_place = width << innerdex*xp->n;
    unsigned int val = xp->buckets[buckdex];
    unsigned int retshifted = width_in_place & val;
    unsigned int ret = retshifted >> innerdex*xp->n;
    return ret; */
}

int main()
{
    bitsarr x = new_bitsarr(100, FOUR);
    for(int i = 0; i<16; i++)
        bitsarr_set(&x, i, i);
    for(int i = 0; i<16; i++)
        printf("%d\n", bitsarr_get(&x, i));
    for(int i = 0; i<16; i++)
        bitsarr_set(&x, i, 15-i);
    for(int i = 0; i<16; i++)
        printf("%d\n", bitsarr_get(&x, i));
    bitsarr_delete(x);
}
于 2014-07-04T22:35:44.610 回答