-1

我想找到一种方法来获取 c# 中不可打印字符的符号(例如"SOH",用于标题的开头和"BS"退格)。有任何想法吗?

编辑:我不需要可视化不可打印字符的字节值,但它的代码如下所示https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html

例如"NUL"for 0x00, "SOH"for0x01等。

4

2 回答 2

1

您可能正在寻找一种字符串转储可视化控制字符。您可以在匹配控制符号的正则表达式的帮助下做到这一点:\p{Cc}

using Systen.Text.RegularExpressions;

...

string source = "BEL \u0007 then CR + LF  \r\n SOH \u0001 \0\0";

// To get control characters visible, we match them and
// replace with their codes
string result = Regex.Replace(
  source, @"\p{Cc}", 
  m => $"[Control: 0x{(int)m.Value[0]:x4}]");

// Let's have a look:

// Initial string 
Console.WriteLine(source);
Console.WriteLine();
// Control symbols visualized
Console.WriteLine(result);

结果:

BEL   then CR + LF  
 SOH  

BEL [Control: 0x0007] then CR + LF  [Control: 0x000d][Control: 0x000a] SOH [Control: 0x0001] [Control: 0x0000][Control: 0x0000]

编辑:如果你想以不同的方式可视化,你应该编辑 lambda

m => $"[Control: 0x{(int)m.Value[0]:x4}]"

例如:

    static string[] knownCodes = new string[] {
      "NULL", "SOH", "STX", "ETX", "EOT", "ENQ",
      "ACK",  "BEL", "BS", "HT", "LF", "VT",
      "FF", "CR", "SO", "SI", "DLE", "DC1", "DC2",
      "DC3", "DC4", "NAK", "SYN", "ETB", "CAN",
      "EM", "SUB", "ESC", "FS", "GS", "RS", "US",
    };

    private static string StringDump(string source) {
      if (null == source)
        return source;

      return Regex.Replace(
        source, 
       @"\p{Cc}", 
        m => {
          int code = (int)(m.Value[0]);

          return code < knownCodes.Length 
            ? $"[{knownCodes[code]}]" 
            : $"[Control 0x{code:x4}]";  
        });
    }

演示:

Console.WriteLine(StringDump(source));

结果:

BEL [BEL] then CR + LF  [CR][LF] SOH [SOH] [NULL][NULL]
于 2021-11-19T09:23:54.323 回答
0

例如,在 Visual Studio 中,只显示 SOH 字符 (U+0001),然后像这样对其进行编码:

var bytes = Encoding.UTF8.GetBytes("☺");

现在你可以用它做任何你喜欢的事情。对于退格使用 U+232B

于 2021-11-19T09:11:22.720 回答