0

我有一个 8x16 位矩阵作为 UINT8 矩阵 [16]。

我想转置矩阵并将其存储为 UINT16 matrix2[8]。

这是我的代码中时间关键的部分,因此我需要尽快执行此操作。有没有一种聪明的方法可以在 MIPS 处理器上实现这一点?

4

2 回答 2

0

也许是这样的:

  lbu $10, matrix
  lbu $11, matrix+1
  lbu $12, matrix+2
  lbu $13, matrix+3
  lbu $14, matrix+4
  lbu $15, matrix+5
  lbu $16, matrix+6
  lbu $17, matrix+7
  lbu $18, matrix+8
  lbu $19, matrix+9
  lbu $20, matrix+10
  lbu $21, matrix+11
  lbu $22, matrix+12
  lbu $23, matrix+13
  lbu $24, matrix+14
  lbu $25, matrix+15

  addiu $2, $0, 8
  addiu $9, $0, 256
loop:
  addiu $2, $2, -1
  srl $9, $9, 1
  addu $27, $0, $0

  and $26, $10, $9
  srlv $26, $26, $2
  or $27, $27, $26

  and $26, $11, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $12, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $13, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $14, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $15, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $16, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $17, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $18, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $19, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $20, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $21, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $22, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $23, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $24, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $25, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  sll $3, $2, 1
  sh $27, transposed($3)
  bgez  $2, loop
  nop  


.data 0x2000
matrix:  
.byte 0x80
.byte 0x80
.byte 0x40
.byte 0x40
.byte 0x20
.byte 0x20
.byte 0x10
.byte 0x10
.byte 0x08
.byte 0x08
.byte 0x04
.byte 0x04
.byte 0x02
.byte 0x02
.byte 0x01
.byte 0x01

.data 0x3000
transposed:
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0

它读取输入矩阵,然后执行 8 次循环(每个转置矩阵行一次)。

于 2011-10-26T19:35:38.080 回答
0

我认为 MIPS 指令集中没有任何特殊指令可以帮助解决此问题,因此您不妨用 C 对其进行编码。如果您可以访问处理器 RTL,则可以创建用户定义的指令。 ..

于 2011-10-28T18:04:00.643 回答