我有一长串介于 0 和 67600 之间的数字。现在我想使用一个长度为 67600 个元素的数组来存储它们。如果数字在集合中,则元素设置为 1,如果数字不在集合中,则元素设置为 0。IE。每次我只需要 1 位信息来存储数字的存在。C/C++ 中是否有任何 hack 可以帮助我实现这一目标?
5 回答
In C++ you can use std::vector<bool>
if the size is dynamic (it's a special case of std::vector
, see this) otherwise there is std::bitset
(prefer std::bitset
if possible.) There is also boost::dynamic_bitset
if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. <<
and >>
are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t
. If you want to store N
bits, it will take N/32
of these uint32_t
s. Bit i
is stored in the i % 32
'th bit of the i / 32
'th uint32_t
. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc
/new
it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)
You should use std::bitset
.
std::bitset
functions like an array of bool
(actually like std::array
, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>
, which I don't recommend because:
- It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
- That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front()
to produce a pointer to the first element of any container type, which fails with std::vector<bool>
. Perhaps a nitpick for your usage case, but still worth knowing about.
There is in fact! std::vector<bool>
has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset
is also available: http://en.cppreference.com/w/cpp/utility/bitset
如果你想用 C 语言编写它,有一个长度为 67601 位(67601/8 = 8451)的 char 数组,然后为每个值打开/关闭相应的位。
其他人给出了正确的想法。这是我自己bitsarr
的位或“数组”的实现。无符号字符是一个字节,因此它本质上是一个无符号字符数组,将信息存储在各个位中。除了一位值之外,我还添加了存储两位或四位值的选项,因为它们都除以 8(一个字节的大小),如果你想存储大量从 0 开始的整数,这将很有用-3 或 0-15。
设置和获取时,数学是在函数中完成的,所以你可以给它一个索引,就好像它是一个普通的数组一样——它知道去哪里找。
此外,用户有责任不要传递一个太大的值来设置,否则会搞砸其他值。可以对其进行修改,使溢出循环回到 0,但这只会使它更加复杂,所以我决定相信自己。
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}