1

A system I'm building needs to convert non-negative Ruby integers into shortest-possible UTF-8 string (should be octet string; see Edit below) values. The only requirement on the strings is that their lexicographic order be identical to the natural order on integers.

What's the best Ruby way to do this?

We can assume the integers are 32 bits and the sign bit is 0. This is successful:

(i >> 24).chr + ((i >> 16) & 0xff).chr + ((i >> 8) & 0xff).chr + (i & 0xff).chr

But it appears to be 1) garbage-intense and 2) ugly. I've also looked at pack solutions, but these don't seem portable due to byte order.

FWIW, the application is Redis hash field names. Building keys may be a performance bottleneck, but probably not. This question is mostly about the "Ruby way".

Edit

Abpve I should have said "shortest possible string of octets" rather than UFT-8, since this is what Redis actually stores for field keys. @Mark Reed's excellent suggestion to try true UTF-8 packing ssems to work. The redis gem I am using seems to properly convert extended codes to octet sequences for Redis: For example,

REDIS.hset('hash', [0x12345678].pack('U'), 'foo')

works fine. But then

REDIS.hkeys('hash')

returns

"\xFC\x92\x8D\x85\x99\xB8"

I need to verify the lexicographic order of these strings is correct, but it looks good so far.

End edit

4

2 回答 2

2

如果它必须是有效的 UTF-8,那么与仅将代码点编码为 UTF-8 字符相比,您不会获得太大的改进;UTF-8 的特点之一是编码字符按正确的数字顺序排序,并且它只使用格式规则下所需的最少字节数。

[i].pack('U')

请注意,UTF-8 是面向字节的,因此没有字节序问题。

如果您实际上不是指 UTF-8,那么请澄清您的意思。

于 2012-11-26T17:09:40.837 回答
0

您希望能够转换为任何基础,并使用该输出来选择您的角色。看到这个答案https://stackoverflow.com/a/2895806/131227

于 2012-11-26T17:02:44.577 回答