java - 编码/解码字符串和特殊字符到字节数组

Question

我需要将 3 个字符串（总是字母）编码为 2 个整数的 2 字节 [] 数组。这样做是为了节省空间和性能。

现在要求发生了一些变化。字符串将具有可变长度。它的长度为 3（如上所示）或长度为 4 并且开头有 1 个特殊字符。特殊字符是固定的，即如果我们选择@，它将始终为@，并且始终位于开头。所以我们确定如果 String 的长度为 3，它将只有字母，如果长度为 4，第一个字符将始终是 '@' 后跟 3 个字母

所以我可以使用

charsAsNumbers[0] = (byte) (locationChars[0] - '@');

代替

charsAsNumbers[0] = (byte) (chars[0] - 'A');

我仍然可以将 3 或 4 个字符编码为 2 字节数组并将它们解码回来吗？如果是这样，怎么做？

score 2 · Accepted Answer

~~不是直接的答案，但这~~是我将如何进行编码：

   public static byte[] encode(String s) {
      int code = s.charAt(0) - 'A' + (32 * (s.charAt(1) - 'A' + 32 * (s.charAt(2) - 'A')));
      byte[] encoded = { (byte) ((code >>> 8) & 255), (byte) (code & 255) };
      return encoded;
   }

第一行使用霍纳的模式将每个字符的 5 位算术组合成一个整数。如果您的任何输入字符超出 [A-`] 范围，它将严重失败。

第二行从整数的前导字节和尾随字节组装一个 2 字节数组。

解码可以以类似的方式完成，步骤相反。

用代码更新（把我的脚放在嘴边，或者类似的东西）：

public class TequilaGuy {

   public static final char SPECIAL_CHAR = '@';

   public static byte[] encode(String s) {
      int special = (s.length() == 4) ? 1 : 0;
      int code = s.charAt(2 + special) - 'A' + (32 * (s.charAt(1 + special) - 'A' + 32 * (s.charAt(0 + special) - 'A' + 32 * special)));
      byte[] encoded = { (byte) ((code >>> 8) & 255), (byte) (code & 255) };
      return encoded;
   }

   public static String decode(byte[] b) {
      int code = 256 * ((b[0] < 0) ? (b[0] + 256) : b[0]) + ((b[1] < 0) ? (b[1] + 256) : b[1]);
      int special = (code >= 0x8000) ? 1 : 0;
      char[] chrs = { SPECIAL_CHAR, '\0', '\0', '\0' };
      for (int ptr=3; ptr>0; ptr--) {
         chrs[ptr] = (char) ('A' + (code & 31));
         code >>>= 5;
      }
      return (special == 1) ? String.valueOf(chrs) : String.valueOf(chrs, 1, 3);
   }

   public static void testEncode() {
      for (int spcl=0; spcl<2; spcl++) {
         for (char c1='A'; c1<='Z'; c1++) {
            for (char c2='A'; c2<='Z'; c2++) {
               for (char c3='A'; c3<='Z'; c3++) {
                  String s = ((spcl == 0) ? "" : String.valueOf(SPECIAL_CHAR)) + c1 + c2 + c3;
                  byte[] cod = encode(s);
                  String dec = decode(cod);
                  System.out.format("%4s : %02X%02X : %s\n", s, cod[0], cod[1], dec);
               }
            }
         }
      }
   }

   public static void main(String[] args) {
      testEncode();
   }

}

score 1 · Accepted Answer

在您的字母表中，您只使用了输出的 16 个可用位中的 15 个。因此，如果字符串的长度为 4，您可以只设置 MSB（最高有效位），因为特殊字符是固定的。

另一种选择是使用转换表。只需创建一个包含所有有效字符的字符串：

String valid = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ";

此字符串中字符的索引是输出中的编码。现在创建两个数组：

byte encode[] = new byte[256];
char decode[] = new char[valid.length ()];
for (int i=0; i<valid.length(); i++) {
    char c = valid.charAt(i);
    encode[c] = i;
    decode[i] = c;
}

现在您可以在数组中查找每个方向的值，并以任何顺序添加您喜欢的任何字符。

score 1 · Accepted Answer

是的，可以在保持先前对 3 个字符值的编码的同时对额外的信息位进行编码。但是由于您的原始编码不会在输出集中留下干净整洁的自由数字，因此通过添加额外字符引入的额外字符串集的映射只能有点不连续。

因此，我认为很难想出处理这些不连续性而又不笨拙和缓慢的映射函数。我得出结论，基于表的映射是唯一明智的解决方案。

我懒得重新设计你的映射代码，所以我把它合并到我的表初始化代码中；这也消除了许多翻译错误的机会:) 你的encode()方法就是我所说OldEncoder.encode()的。

我已经运行了一个小型测试程序来验证它是否NewEncoder.encode()具有与相同的值OldEncoder.encode()，并且还能够使用前导第 4 个字符对字符串进行编码。NewEncoder.encode()不关心字符是什么，它按字符串长度；对于decode()，使用的字符可以使用PREFIX_CHAR. 我还检查了前缀字符串的字节数组值是否不复制非前缀字符串的任何值；最后，编码的前缀字符串确实可以转换回相同的前缀字符串。

package tequilaguy;


public class NewConverter {

   private static final String[] b2s = new String[0x10000];
   private static final int[] s2b = new int[0x10000];
   static { 
      createb2s();
      creates2b();
   }

   /**
    * Create the "byte to string" conversion table.
    */
   private static void createb2s() {
      // Fill 17576 elements of the array with b -> s equivalents.
      // index is the combined byte value of the old encode fn; 
      // value is the String (3 chars). 
      for (char a='A'; a<='Z'; a++) {
         for (char b='A'; b<='Z'; b++) {
            for (char c='A'; c<='Z'; c++) {
               String str = new String(new char[] { a, b, c});
               byte[] enc = OldConverter.encode(str);
               int index = ((enc[0] & 0xFF) << 8) | (enc[1] & 0xFF);
               b2s[index] = str;
               // int value = 676 * a + 26 * b + c - ((676 + 26 + 1) * 'A'); // 45695;
               // System.out.format("%s : %02X%02X = %04x / %04x %n", str, enc[0], enc[1], index, value);
            }
         }
      }
      // Fill 17576 elements of the array with b -> @s equivalents.
      // index is the next free (= not null) array index;
      // value = the String (@ + 3 chars)
      int freep = 0;
      for (char a='A'; a<='Z'; a++) {
         for (char b='A'; b<='Z'; b++) {
            for (char c='A'; c<='Z'; c++) {
               String str = "@" + new String(new char[] { a, b, c});
               while (b2s[freep] != null) freep++;
               b2s[freep] = str;
               // int value = 676 * a + 26 * b + c - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
               // System.out.format("%s : %02X%02X = %04x / %04x %n", str, 0, 0, freep, value);
            }
         }
      }
   }

   /**
    * Create the "string to byte" conversion table.
    * Done by inverting the "byte to string" table.
    */
   private static void creates2b() {
      for (int b=0; b<0x10000; b++) {
         String s = b2s[b];
         if (s != null) {
            int sval;
            if (s.length() == 3) {
               sval = 676 * s.charAt(0) + 26 * s.charAt(1) + s.charAt(2) - ((676 + 26 + 1) * 'A');
            } else {
               sval = 676 * s.charAt(1) + 26 * s.charAt(2) + s.charAt(3) - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
            }
            s2b[sval] = b;
         }
      }
   }

   public static byte[] encode(String str) {
      int sval;
      if (str.length() == 3) {
         sval = 676 * str.charAt(0) + 26 * str.charAt(1) + str.charAt(2) - ((676 + 26 + 1) * 'A');
      } else {
         sval = 676 * str.charAt(1) + 26 * str.charAt(2) + str.charAt(3) - ((676 + 26 + 1) * 'A') + (26 * 26 * 26);
      }
      int bval = s2b[sval];
      return new byte[] { (byte) (bval >> 8), (byte) (bval & 0xFF) };
   }

   public static String decode(byte[] b) {
      int bval = ((b[0] & 0xFF) << 8) | (b[1] & 0xFF);
      return b2s[bval];
   }

}

我在代码中留下了一些复杂的常量表达式，尤其是 26 次幂的东西。否则代码看起来非常神秘。您可以将它们保持原样而不会损失性能，因为编译器会像 Kleenexes 一样将它们折叠起来。

更新：

随着圣诞节的恐怖临近，我将在路上一段时间。希望你能及时找到这个答案和代码，好好利用它。为了支持我将在我的小测试程序中投入的努力。它不直接检查内容，而是以所有重要方式打印出转换结果，并允许您通过眼睛和手检查它们。我摆弄了我的代码（一旦我得到了基本的想法，就进行了一些小调整），直到那里的一切看起来都很好。您可能希望进行更机械和详尽的测试。

package tequilaguy;

public class ConverterHarness {

//   private static void runOldEncoder() {
//      for (char a='A'; a<='Z'; a++) {
//         for (char b='A'; b<='Z'; b++) {
//            for (char c='A'; c<='Z'; c++) {
//               String str = new String(new char[] { a, b, c});
//               byte[] enc = OldConverter.encode(str);
//               System.out.format("%s : %02X%02X%n", str, enc[0], enc[1]);
//            }
//         }
//      }
//   }

   private static void testNewConverter() {
      for (char a='A'; a<='Z'; a++) {
         for (char b='A'; b<='Z'; b++) {
            for (char c='A'; c<='Z'; c++) {
               String str = new String(new char[] { a, b, c});
               byte[] oldEnc = OldConverter.encode(str);
               byte[] newEnc = NewConverter.encode(str);
               byte[] newEnc2 = NewConverter.encode("@" + str);
               System.out.format("%s : %02X%02X %02X%02X %02X%02X %s %s %n", 
                     str, oldEnc[0], oldEnc[1], newEnc[0], newEnc[1], newEnc2[0], newEnc2[1],
                     NewConverter.decode(newEnc), NewConverter.decode(newEnc2));
            }
         }
      }
   }
   public static void main(String[] args) {
      testNewConverter();
   }

}

score 0 · Accepted Answer

如果您只是使用java.nio.charset.CharsetEncoder该类将字符转换为字节，您会发现这要容易得多。它甚至适用于 ASCII 以外的字符。甚至String.getBytes会少很多代码来达到相同的基本效果。

score 0 · Accepted Answer

如果“特殊字符”是固定的，并且您总是知道 4 个字符的字符串以这个特殊字符开头，那么 char 本身没有提供有用的信息。

如果字符串长度为 3 个字符，则执行您之前所做的操作；如果它是 4 个字符，则对从第 2 个字符开始的字符串子字符串运行旧算法。

是我想得太简单还是你想得太难了？

java - 编码/解码字符串和特殊字符到字节数组

5 回答 5

Related

Reference