1

我有一些图像数据作为 bytea 存储在 PostgreSQL 数据库表列中。我还有关于数据的元数据用于解释它,相关的是图像尺寸和类。类包括 int16、uint16。我找不到任何有关在 R 中正确解释有符号/无符号整数的信息。

我正在使用 RPostgreSQL 将数据提取到 R 中,并且我想在 R 中查看图像。

MWE:

# fakeDataQuery <- dbGetQuery(conn, 
#     'select byteArray, ImageSize, ImageClass from table where id = 1')

# Example 1 (no negative numbers)
# the actual byte array shown in octal sequences in pgadmin (1.22.2) Query Output is: 
# "\001\000\002\000\003\000\004\000\005\000\006\000\007\000\010\000\011\000"

# but RPostgreSQL returns the hex-encoded version:
byteArray <- "\\x010002000300040005000600070008000900"
ImageSize <- c(3, 3, 1)
ImageClass <- 'int16'

# expected result 
> array(c(1,2,3,4,5,6,7,8,9), dim=c(3,3,1))
#   , , 1
#
#        [,1] [,2] [,3]
#[1,]    1    4    7
#[2,]    2    5    8
#[3,]    3    6    9

# Example 2: (with negtive numbers)
byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
ImageSize <- c(3, 4, 1)
ImageClass <- 'int16'
# expectedResult 
> array(c(-1,0,1,2,3,4,5,6,7,8,9,10), dim=c(3,4,1))
#, , 1
#
#     [,1] [,2] [,3] [,4]
#[1,]   -1    2    5    8
#[2,]    0    3    6    9
#[3,]    1    4    7   10

我试过的:

来自 PostgreSQL 的 bytea 数据是一个长字符串,编码为“十六进制”,您可以通过前面\\x的前缀来判断(我相信还有一个额外\的用于转义现有的?):https://www. postgresql.org/docs/9.1/static/datatype-binary.html(参见:第 8.4.1 节。'bytea Hex format')

将“十六进制”解码回原始类型(基于 ImageClass 的“int16”)

根据上面相同的 url,十六进制编码使用“每个字节 2 个十六进制数字”。所以我需要将编码的 byteArray 拆分为适当长度的子字符串,请参阅:this link

# remove the \\x hex encoding indicator(s) added by PostgreSQL
byteArray <- gsub("\\x", "", x = byteArray, fixed=T)

l <- 2  # hex digits per byte (substring length)
byteArray <- strsplit(trimws(gsub(pattern = paste0("(.{",l,"})"), 
                                  replacement = "\\1 ", 
                                  x = byteArray)), 
                      " ")[[1]]

# for some reason these appear to be in the opposite order than i expect
# Ex: 1 is stored as '0100' rather than '0001'
# so reverse the digits (int16 specific)
byteArray <- paste0(byteArray[c(F,T)],byteArray[c(T,F)])

# strtoi() converts a vector of hex values given a decimal base
byteArray <- strtoi(byteArray, 16L)

# now make it into an n x m x s array,
# e.g., 512 x 512 x (# slices)
V = array(byteArray, dim = ImageSize)

这个解决方案有两个问题:

  1. 它不适用于有符号类型,因此负整数值将被解释为无符号值(例如,'ffff' 为 -1 (int16) 但 65535 (uint16) 和 strtoi() 将始终返回 65535)。
  2. 它目前仅针对 int16 进行编码,并且需要一些额外的代码才能与其他类型一起使用(例如,int32、int64)

任何人都有适用于签名类型的解决方案?

4

1 回答 1

2

您可以从这个转换函数开始,更快地替换strsplit并使用readBin结果:

byteArray <- "\\xffff00000100020003000400050006000700080009000a00"

## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
  unsigned int length = hex.length()/2;
  CharacterVector res(length);
  for (unsigned int i = 0; i < length; ++i) {
    res(i) = hex.substr(2*i, 2);
  }
  return res;
}')

## A function to convert one string to an array of raw
f <- function(x)  {
  ## Split a long string into a a vector of character pairs
  x <- strsplit2(x)
  ## Remove the first element, "\\x"
  x <- x[-1]
  ## Complete the conversion
  as.raw(as.hexmode(x))
}

raw <- f(byteArray)
# int16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
# -1  0  1  2  3  4  5  6  7  8  9 10

# uint16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = FALSE,
        endian = "little")
# 65535     0     1     2     3     4     5     6     7     8     9    10

# int32
readBin(con = raw,
        what = "integer",
        n = length(raw) / 4,
        size = 4,
        signed = TRUE,
        endian = "little")
# 65535 131073 262147 393221 524295 655369

但是,这不适用于uint32and (u)int64,因为 R 在int32内部使用。但是,R 也可以numerics用来存储 2^52 以下的整数。所以我们可以使用这个:

# uint32
byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
int32 <- readBin(con = f(byteArray),
                 what = "integer",
                 n = length(raw) / 4,
                 size = 4,
                 signed = TRUE,
                 endian = "little")

ifelse(int32 < 0, int32 + 2^32, int32)
# 4294967295     131073     262147     393221     524295     655369

对于gzip压缩数据:

# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
close(con = con)

由于这是一个真正的连接,我们必须确保关闭它。

于 2018-06-23T09:52:12.577 回答